Re: Introduction to the mailing list

2019-10-04 Thread Manuela Chamda Tchakoute
Hello,

Thank you for reaching out. I will take a look at all the links sent and
get back to you in case of any queries.

Regards,
Manuela.

On Sat, Oct 5, 2019, 2:51 AM Kenneth Knowles  wrote:

> Welcome!
>
> On Fri, Oct 4, 2019 at 1:29 PM Ahmet Altay  wrote:
>
>> Welcome Manuela.
>>
>> You can look at (https://issues.apache.org/jira/browse/BEAM-2855) as the
>> starting point. It has links to 2 previously merged PRs that can serve as
>> starting points for writing new nexmark queries. There are nexmark queries
>> described in (https://cwiki.apache.org/confluence/display/BEAM/Nexmark).
>> Second step would be picking a few more of those and starting with the
>> implementation.
>>
>> +Ismaël Mejía  and I can help with PR reviews and
>> specific questions as well.
>>
>> Hope this helps.
>>
>> Ahmet
>>
>> On Fri, Oct 4, 2019 at 9:43 AM Thomas Weise  wrote:
>>
>>> Welcome, Manuela!
>>>
>>> For getting familiar with the Beam development environment in general, I
>>> would recommend to take a look at:
>>>
>>> https://beam.apache.org/get-started/quickstart-py/
>>>
>>> https://cwiki.apache.org/confluence/display/BEAM/Nexmark
>>>
>>> For contributing and collaborating in general, please take a look at:
>>>
>>> https://beam.apache.org/contribute/
>>>
>>> There is a lot more useful information for contributors on our cwiki:
>>>
>>> https://cwiki.apache.org/confluence/display/BEAM/Apache+Beam
>>>
>>> And don't hesitate to reach out anytime here on this list with questions.
>>>
>>> Thanks,
>>> Thomas
>>>
>>>
>>> On Fri, Oct 4, 2019 at 4:53 AM Manuela Chamda Tchakoute <
>>> chamdamanu...@gmail.com> wrote:
>>>
 Hello.

  My name is Chamda Manuela from the University of Buea, Cameroon. I am
 new to open source and comfortable with Python programming language. I will
 like to contribute to the outreachy project "Extend the Nextmark
 Benchmarking suite in Apache Beam to include python and portable runners".

 I will be glad if someone could help me with a step by step guide to
 get started.


 Thank you.

>>>


Re: [VOTE] Release 2.16.0, release candidate #1

2019-10-04 Thread Kenneth Knowles
Don't worry about the template. It is not required. Your first email
contained enough information and the archives support the details. Your
second one contained even more, and that is certainly nice :-)

However, please do edit the subject when a vote is done, to "[RESULT]
[VOTE] ". This makes it easy to find exactly the email with
the result.

Kenn

On Fri, Oct 4, 2019 at 5:05 PM Mark Liu  wrote:

> (Sorry for the informal note. I just realized there is a template I need
> to follow for the announcement:)
>
> I'm happy to announce that we have unanimously approved this release.
>
> There are 7 approving votes, 4 of which are binding:
> * Ahmet (al...@google.com)
> * Pablo (pabl...@google.com)
> * Robert (rober...@google.com)
> * Kenneth (k...@apache.org)
>
> There are no disapproving votes.
>
> Next step is to finalize the release (merge the docs/website/blog PRs,
> publish artifacts). Please let me know if you have any questions.
>
> Thanks everyone!
>
> Regards,
> Mark
>
> On Fri, Oct 4, 2019 at 4:19 PM Mark Liu  wrote:
>
>> Thank you all for rc validation and voting!
>>
>> We collected 7 votes including 4 from PMC and all 2.16 JIRA issues
>>  are
>> resolved. This meets release finalization criteria and I'll go ahead with
>> the next steps.
>>
>> Thanks,
>> Mark
>>
>>
>>
>> On Fri, Oct 4, 2019 at 4:02 PM Robin Qiu  wrote:
>>
>>> +1
>>>
>>> Verified the new module sdks/java/extensions/zetasketch works (on direct
>>> runner)
>>>
>>> On Fri, Oct 4, 2019 at 12:41 PM Kenneth Knowles  wrote:
>>>
 +1 (binding)

  - Reviewed what verifications had been done. Nice.
  - Also did a gradle build of some targets in the archival source
 release

 The source release still does not build as a whole, as it has not since
 2.9.0 it seems. It is not as simple as excluding website from the build,
 because it fails at configure time. Since particular artifacts can build,
 it is not a blocker, but I've taken
 https://issues.apache.org/jira/browse/BEAM-6228 and upgraded to
 critical and put 2.17.0 as Release Version.

 Kenn

 On Fri, Oct 4, 2019 at 10:27 AM Pablo Estrada 
 wrote:

> Hi all,
> I looked at https://issues.apache.org/jira/browse/BEAM-8303, and it
> seems like the user has a workaround - is that correct?
> If that's the case, then I vote +1.
>
> @Max - lmk if you'd like to discuss further, but for now my vote is
> on +1.
> Best
> -P.
>
> On Fri, Oct 4, 2019 at 9:29 AM Mark Liu  wrote:
>
>> +1 (forgot to vote)
>>
>> I also triggered Java Nexmark on direct, dataflow, spark and flink
>> runner. Didn't saw performance regression from the dashboard (
>> https://apache-beam-testing.appspot.com/dashboard-admin)
>>
>> On Fri, Oct 4, 2019 at 8:23 AM Mark Liu  wrote:
>>
>>> Thanks for the validation work! I validated following:
>>>
>>> - Java Quickstart on direct, dataflow,spark local, flink local runner
>>> - Java mobile gaming on direct and dataflow runner
>>> - Python Quickstart in batch and streaming in py2/3.5/3.6/3.7 using
>>> wheals/zip
>>> - Python Mobile Game in batch/streaming in py2/3.5/3.6/3.7 using
>>> wheals/zip on direct and dataflow runner
>>>
>>> Mark
>>>
>>> On Thu, Oct 3, 2019 at 6:57 PM Ahmet Altay  wrote:
>>>
 I see most of the release validations have been completed and
 marked in the spreadsheet. Thank you all for doing that. If you have 
 not
 validated/voted yet please take a look at the release candidate.

 On Thu, Oct 3, 2019 at 7:59 AM Thomas Weise  wrote:

> I think there is a different reason why the release manager should
> probably merge/approve all PRs that go into the release branch while 
> the
> release is in progress:
>
> If/when the need arises for another RC, then only those changes
> should be included that are deemed blockers or explicitly agreed. 
> Otherwise
> the release can potentially be delayed by modifications that 
> invalidate
> prior verification or introduce new instability.
>

 I agree with this reasoning. It expresses my concern in a more
 clear way.


>
> Thomas
>
>
> On Thu, Oct 3, 2019 at 3:12 AM Maximilian Michels 
> wrote:
>
>>  > For the next time, may I suggest asking release manager to do
>> the
>>  > merging to the release branch. We do not know whether there
>> will be an
>>  > RC2 or not. And if there will not be an RC2 release branch as
>> of now
>>  > does not directly correspond to what will be released.
>>
>> The ground truth for releases are the release tags, not the
>>

Re: Plan for dropping python 2 support

2019-10-04 Thread Valentyn Tymofieiev
On Fri, Oct 4, 2019 at 11:02 AM Robert Bradshaw  wrote:

> Thanks for holding this vote. Note that this is a pledge to remove
> support sometime in 2020, but no promises as to whether that will be
> January or December (though I hope sooner rather than later)


Right.


>
Valentyn, did you want to go ahead and make a PR adding Apache Beam to
> the python3statement page?
>

Yes, I sent
https://github.com/python3statement/python3statement.github.io/pull/265.

>
> On Mon, Sep 30, 2019 at 5:10 PM Valentyn Tymofieiev 
> wrote:
> >
> > As suggested and enthusiastically supported by several folks in this
> thread, I will send a vote to sign a pledge on http://python3statement.org
> on behalf of Apache Beam to discontinue Python 2 support in or before 2020.
> >
> > The motivation for signing the pledge is:
> > - to provide another signal to Beam users, and projects that depend on
> Beam that Beam Python 2 offering will soon sunset;
> > - to facilitate adoption of Python 3 by Beam users, developers, and
> runner maintainers;
> > - to facilitate adoption of Python 3 in wider Python ecosystem.
> >
> > See also http://python3stament.org for background behind this pledge
> and the list of projects which have already signed it.
> >
> > On Mon, Sep 23, 2019 at 4:45 PM Kyle Weaver  wrote:
> >>
> >> Re feedback collection, we already print a message:
> >> "You are using Apache Beam with Python 2. New releases of Apache Beam
> will soon support Python 3 only."
> >> When users run Python 2 pipelines. This might be a good place to
> provide additional info along with a place to send feedback (probably user@).
> While I'm sure not everyone out there reads their logs, I imagine this is a
> sure and easy way of reaching at least some Python 2 users.
> >>
> >> Kyle Weaver | Software Engineer | github.com/ibzib |
> kcwea...@google.com
> >>
> >>
> >> On Fri, Sep 20, 2019 at 10:28 AM Valentyn Tymofieiev <
> valen...@google.com> wrote:
> >>>
> >>> Thank you, Chad, for refreshing this conversation and adding the
> perspective of Python 2 users of Beam who have not(yet) completed the
> migration. My thoughts below.
> >>>
> >>> - It is in the best interest of everyone to ensure a smooth migration
> for Beam users. However a migration needs to happen since Python ecosystem
> is moving off of Python 2.
> >>> - Beam has a couple of dozen dependencies, and we cannot have an
> expectation that Python 2 versions of these dependencies will be maintained
> in 2020.
> >>> - BEAM-1251 should be closed, since it may communicate a signal that
> Beam does not support Python 3, while it does. Beam has first announced
> support of Python 3 in Beam 2.11.0, admittedly later than many mainstream
> libraries in Python ecosystem.
> >>> - I think Python 2 LTS release (if we continue them) may have critical
> bug fixes, but not new features, so we won't be backporting new features.
> >>> - Beam portability allows users to customize usercode runtime
> environment, and it should be possible for users to supply a Python 2 SDK
> harness container, should they have no other option. This would require a
> backported user-supplied version of Beam SDK that works on Python 2,
> although such SDK may become difficult/impractical to maintain for most
> users.
> >>> - There are several open issues related to Python 3, but they are
> improvements in nature, and we are steadily closing them off. I am not
> aware of any adoption blockers for Beam Python 3, specific to Beam.
> >>> - I have not heard of users reports who attempted but were not able to
> use Beam on Python 3.
> >>> - This does not mean that our offering is perfect, there may be errors
> and omissions that are yet to be discovered. However, it would be in the
> best interest of the Beam community to discover these issues earlier. A
> message that Beam will discontinue Python 2 support will encourage users to
> migrate, therefore I also support Beam signing
> https://python3statement.org.
> >>> - Having more usage statistics and feedback closer to 2020 can help us
> be more confident in deciding when to stop Python 2 support.
> >>>
> >>> On Thu, Sep 19, 2019 at 6:05 PM Ahmet Altay  wrote:
> 
>  Thanks a lot for sharing your thoughts, I completely agree that we
> need to minimize the burden on our users as much as possible. Especially in
> this case when we are offering a robust python 3 solution just now. However
> I do share the same concerns related to dependencies and tool chains, It
> will be increasingly difficult for us to keep our code base compatible with
> python2 and python3 overtime. (To be very explicit, one of those
> dependencies is Dataflow's python pre-portability workers.)
> 
>  On Thu, Sep 19, 2019 at 5:17 PM Maximilian Michels 
> wrote:
> >
> > Granted that we just have finalized the Python 3 support, we should
> > allow time for it to mature and for users to make the switch.
> >
> > > Oh, and one more thing, I think it'd make sense for Apache Beam to
> >>>

Re: Introduction to the mailing list

2019-10-04 Thread Kenneth Knowles
Welcome!

On Fri, Oct 4, 2019 at 1:29 PM Ahmet Altay  wrote:

> Welcome Manuela.
>
> You can look at (https://issues.apache.org/jira/browse/BEAM-2855) as the
> starting point. It has links to 2 previously merged PRs that can serve as
> starting points for writing new nexmark queries. There are nexmark queries
> described in (https://cwiki.apache.org/confluence/display/BEAM/Nexmark).
> Second step would be picking a few more of those and starting with the
> implementation.
>
> +Ismaël Mejía  and I can help with PR reviews and
> specific questions as well.
>
> Hope this helps.
>
> Ahmet
>
> On Fri, Oct 4, 2019 at 9:43 AM Thomas Weise  wrote:
>
>> Welcome, Manuela!
>>
>> For getting familiar with the Beam development environment in general, I
>> would recommend to take a look at:
>>
>> https://beam.apache.org/get-started/quickstart-py/
>>
>> https://cwiki.apache.org/confluence/display/BEAM/Nexmark
>>
>> For contributing and collaborating in general, please take a look at:
>>
>> https://beam.apache.org/contribute/
>>
>> There is a lot more useful information for contributors on our cwiki:
>>
>> https://cwiki.apache.org/confluence/display/BEAM/Apache+Beam
>>
>> And don't hesitate to reach out anytime here on this list with questions.
>>
>> Thanks,
>> Thomas
>>
>>
>> On Fri, Oct 4, 2019 at 4:53 AM Manuela Chamda Tchakoute <
>> chamdamanu...@gmail.com> wrote:
>>
>>> Hello.
>>>
>>>  My name is Chamda Manuela from the University of Buea, Cameroon. I am
>>> new to open source and comfortable with Python programming language. I will
>>> like to contribute to the outreachy project "Extend the Nextmark
>>> Benchmarking suite in Apache Beam to include python and portable runners".
>>>
>>> I will be glad if someone could help me with a step by step guide to get
>>> started.
>>>
>>>
>>> Thank you.
>>>
>>


[portability] Removing the old portable metrics API...

2019-10-04 Thread Pablo Estrada
Hello devs,
I recently took a look at how Dataflow is retrieving metrics from the Beam
SDK harnesses, and noticed something. As you may (or may not) remember, the
portability API currently has two ways of reporting metrics. Namely, the
newer MonitoringInfo API[1], and the older Metrics one[2].

This is somewhat troublesome because now we have two things that do the
same thing. The SDKs report double the amount of metrics[3][4], and I bet
it's confusing for runner implementers.

Luckily, it seems like the Flink and Spark runners do use the new API
[5][6] - yay! : ) - so I guess then the only runner that uses the old API
is Dataflow? (internally)

Which way does the Samza runner use? +Hai Lu?
How about the Go SDK +Robert Burke  ? - Ah I bet this uses
the old API?

If they all use the MonitoringInfos, we may be able to clean up the old
api, and move to the new one (somewhat)soon : )

[1]
https://github.com/apache/beam/blob/v2.15.0/model/fn-execution/src/main/proto/beam_fn_api.proto#L395
[2]
https://github.com/apache/beam/blob/v2.15.0/model/fn-execution/src/main/proto/beam_fn_api.proto#L391
[3]
https://github.com/apache/beam/blob/c1007b678a00ea85671872236edef940a8e56adc/sdks/python/apache_beam/runners/worker/sdk_worker.py#L406-L414
[4]
https://github.com/apache/beam/blob/c1007b678a00ea85671872236edef940a8e56adc/sdks/python/apache_beam/runners/worker/sdk_worker.py#L378-L384

[5]
https://github.com/apache/beam/blob/44fa33e6518574cb9561f47774e218e0910093fe/runners/flink/src/main/java/org/apache/beam/runners/flink/metrics/FlinkMetricContainer.java#L94-L97
[6]
https://github.com/apache/beam/blob/932bd80a17171bd2d8157820ffe09e8389a52b9b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkExecutableStageFunction.java#L219-L226


Re: [VOTE] Release 2.16.0, release candidate #1

2019-10-04 Thread Mark Liu
(Sorry for the informal note. I just realized there is a template I need to
follow for the announcement:)

I'm happy to announce that we have unanimously approved this release.

There are 7 approving votes, 4 of which are binding:
* Ahmet (al...@google.com)
* Pablo (pabl...@google.com)
* Robert (rober...@google.com)
* Kenneth (k...@apache.org)

There are no disapproving votes.

Next step is to finalize the release (merge the docs/website/blog PRs,
publish artifacts). Please let me know if you have any questions.

Thanks everyone!

Regards,
Mark

On Fri, Oct 4, 2019 at 4:19 PM Mark Liu  wrote:

> Thank you all for rc validation and voting!
>
> We collected 7 votes including 4 from PMC and all 2.16 JIRA issues
>  are
> resolved. This meets release finalization criteria and I'll go ahead with
> the next steps.
>
> Thanks,
> Mark
>
>
>
> On Fri, Oct 4, 2019 at 4:02 PM Robin Qiu  wrote:
>
>> +1
>>
>> Verified the new module sdks/java/extensions/zetasketch works (on direct
>> runner)
>>
>> On Fri, Oct 4, 2019 at 12:41 PM Kenneth Knowles  wrote:
>>
>>> +1 (binding)
>>>
>>>  - Reviewed what verifications had been done. Nice.
>>>  - Also did a gradle build of some targets in the archival source release
>>>
>>> The source release still does not build as a whole, as it has not since
>>> 2.9.0 it seems. It is not as simple as excluding website from the build,
>>> because it fails at configure time. Since particular artifacts can build,
>>> it is not a blocker, but I've taken
>>> https://issues.apache.org/jira/browse/BEAM-6228 and upgraded to
>>> critical and put 2.17.0 as Release Version.
>>>
>>> Kenn
>>>
>>> On Fri, Oct 4, 2019 at 10:27 AM Pablo Estrada 
>>> wrote:
>>>
 Hi all,
 I looked at https://issues.apache.org/jira/browse/BEAM-8303, and it
 seems like the user has a workaround - is that correct?
 If that's the case, then I vote +1.

 @Max - lmk if you'd like to discuss further, but for now my vote is
 on +1.
 Best
 -P.

 On Fri, Oct 4, 2019 at 9:29 AM Mark Liu  wrote:

> +1 (forgot to vote)
>
> I also triggered Java Nexmark on direct, dataflow, spark and flink
> runner. Didn't saw performance regression from the dashboard (
> https://apache-beam-testing.appspot.com/dashboard-admin)
>
> On Fri, Oct 4, 2019 at 8:23 AM Mark Liu  wrote:
>
>> Thanks for the validation work! I validated following:
>>
>> - Java Quickstart on direct, dataflow,spark local, flink local runner
>> - Java mobile gaming on direct and dataflow runner
>> - Python Quickstart in batch and streaming in py2/3.5/3.6/3.7 using
>> wheals/zip
>> - Python Mobile Game in batch/streaming in py2/3.5/3.6/3.7 using
>> wheals/zip on direct and dataflow runner
>>
>> Mark
>>
>> On Thu, Oct 3, 2019 at 6:57 PM Ahmet Altay  wrote:
>>
>>> I see most of the release validations have been completed and marked
>>> in the spreadsheet. Thank you all for doing that. If you have not
>>> validated/voted yet please take a look at the release candidate.
>>>
>>> On Thu, Oct 3, 2019 at 7:59 AM Thomas Weise  wrote:
>>>
 I think there is a different reason why the release manager should
 probably merge/approve all PRs that go into the release branch while 
 the
 release is in progress:

 If/when the need arises for another RC, then only those changes
 should be included that are deemed blockers or explicitly agreed. 
 Otherwise
 the release can potentially be delayed by modifications that invalidate
 prior verification or introduce new instability.

>>>
>>> I agree with this reasoning. It expresses my concern in a more clear
>>> way.
>>>
>>>

 Thomas


 On Thu, Oct 3, 2019 at 3:12 AM Maximilian Michels 
 wrote:

>  > For the next time, may I suggest asking release manager to do
> the
>  > merging to the release branch. We do not know whether there
> will be an
>  > RC2 or not. And if there will not be an RC2 release branch as
> of now
>  > does not directly correspond to what will be released.
>
> The ground truth for releases are the release tags, not the
> release
> branches. Downstream projects should not depend on the release
> branches.
> Release branches are merely important for the process of creating
> a
> release, but they lose validity after the RC has been created and
> released.
>
> On 02.10.19 11:45, Ahmet Altay wrote:
> > +1 (validated python quickstarts). Thank you Mark.
> >
> > On Wed, Oct 2, 2019 at 10:49 AM Maximilian Michels <
> m...@apache.org
> > > wrote:
> >
> 

Re: [VOTE] Release 2.16.0, release candidate #1

2019-10-04 Thread Mark Liu
Thank you all for rc validation and voting!

We collected 7 votes including 4 from PMC and all 2.16 JIRA issues
 are
resolved. This meets release finalization criteria and I'll go ahead with
the next steps.

Thanks,
Mark



On Fri, Oct 4, 2019 at 4:02 PM Robin Qiu  wrote:

> +1
>
> Verified the new module sdks/java/extensions/zetasketch works (on direct
> runner)
>
> On Fri, Oct 4, 2019 at 12:41 PM Kenneth Knowles  wrote:
>
>> +1 (binding)
>>
>>  - Reviewed what verifications had been done. Nice.
>>  - Also did a gradle build of some targets in the archival source release
>>
>> The source release still does not build as a whole, as it has not since
>> 2.9.0 it seems. It is not as simple as excluding website from the build,
>> because it fails at configure time. Since particular artifacts can build,
>> it is not a blocker, but I've taken
>> https://issues.apache.org/jira/browse/BEAM-6228 and upgraded to critical
>> and put 2.17.0 as Release Version.
>>
>> Kenn
>>
>> On Fri, Oct 4, 2019 at 10:27 AM Pablo Estrada  wrote:
>>
>>> Hi all,
>>> I looked at https://issues.apache.org/jira/browse/BEAM-8303, and it
>>> seems like the user has a workaround - is that correct?
>>> If that's the case, then I vote +1.
>>>
>>> @Max - lmk if you'd like to discuss further, but for now my vote is
>>> on +1.
>>> Best
>>> -P.
>>>
>>> On Fri, Oct 4, 2019 at 9:29 AM Mark Liu  wrote:
>>>
 +1 (forgot to vote)

 I also triggered Java Nexmark on direct, dataflow, spark and flink
 runner. Didn't saw performance regression from the dashboard (
 https://apache-beam-testing.appspot.com/dashboard-admin)

 On Fri, Oct 4, 2019 at 8:23 AM Mark Liu  wrote:

> Thanks for the validation work! I validated following:
>
> - Java Quickstart on direct, dataflow,spark local, flink local runner
> - Java mobile gaming on direct and dataflow runner
> - Python Quickstart in batch and streaming in py2/3.5/3.6/3.7 using
> wheals/zip
> - Python Mobile Game in batch/streaming in py2/3.5/3.6/3.7 using
> wheals/zip on direct and dataflow runner
>
> Mark
>
> On Thu, Oct 3, 2019 at 6:57 PM Ahmet Altay  wrote:
>
>> I see most of the release validations have been completed and marked
>> in the spreadsheet. Thank you all for doing that. If you have not
>> validated/voted yet please take a look at the release candidate.
>>
>> On Thu, Oct 3, 2019 at 7:59 AM Thomas Weise  wrote:
>>
>>> I think there is a different reason why the release manager should
>>> probably merge/approve all PRs that go into the release branch while the
>>> release is in progress:
>>>
>>> If/when the need arises for another RC, then only those changes
>>> should be included that are deemed blockers or explicitly agreed. 
>>> Otherwise
>>> the release can potentially be delayed by modifications that invalidate
>>> prior verification or introduce new instability.
>>>
>>
>> I agree with this reasoning. It expresses my concern in a more clear
>> way.
>>
>>
>>>
>>> Thomas
>>>
>>>
>>> On Thu, Oct 3, 2019 at 3:12 AM Maximilian Michels 
>>> wrote:
>>>
  > For the next time, may I suggest asking release manager to do the
  > merging to the release branch. We do not know whether there will
 be an
  > RC2 or not. And if there will not be an RC2 release branch as of
 now
  > does not directly correspond to what will be released.

 The ground truth for releases are the release tags, not the release
 branches. Downstream projects should not depend on the release
 branches.
 Release branches are merely important for the process of creating a
 release, but they lose validity after the RC has been created and
 released.

 On 02.10.19 11:45, Ahmet Altay wrote:
 > +1 (validated python quickstarts). Thank you Mark.
 >
 > On Wed, Oct 2, 2019 at 10:49 AM Maximilian Michels <
 m...@apache.org
 > > wrote:
 >
 > Thanks for preparing the release, Mark! I would like to
 address
 > https://issues.apache.org/jira/browse/BEAM-8303 in the
 release. I've
 > already merged the fix to the release-2.16.0 branch. If we do
 another
 > RC, we could include it. As a user is blocked on this, I
 would not vote
 > +1 for this RC, but I also do not want to block the release
 process.
 >
 >
 > Max, thank you for the clear communication for the importance and
 at the
 > same time non-blocking status of the issue.
 >
 > For the next time, may I suggest asking release manager to do the
 > merging to the release branch. We do not know whe

Re: [VOTE] Release 2.16.0, release candidate #1

2019-10-04 Thread Robin Qiu
+1

Verified the new module sdks/java/extensions/zetasketch works (on direct
runner)

On Fri, Oct 4, 2019 at 12:41 PM Kenneth Knowles  wrote:

> +1 (binding)
>
>  - Reviewed what verifications had been done. Nice.
>  - Also did a gradle build of some targets in the archival source release
>
> The source release still does not build as a whole, as it has not since
> 2.9.0 it seems. It is not as simple as excluding website from the build,
> because it fails at configure time. Since particular artifacts can build,
> it is not a blocker, but I've taken
> https://issues.apache.org/jira/browse/BEAM-6228 and upgraded to critical
> and put 2.17.0 as Release Version.
>
> Kenn
>
> On Fri, Oct 4, 2019 at 10:27 AM Pablo Estrada  wrote:
>
>> Hi all,
>> I looked at https://issues.apache.org/jira/browse/BEAM-8303, and it
>> seems like the user has a workaround - is that correct?
>> If that's the case, then I vote +1.
>>
>> @Max - lmk if you'd like to discuss further, but for now my vote is on +1.
>> Best
>> -P.
>>
>> On Fri, Oct 4, 2019 at 9:29 AM Mark Liu  wrote:
>>
>>> +1 (forgot to vote)
>>>
>>> I also triggered Java Nexmark on direct, dataflow, spark and flink
>>> runner. Didn't saw performance regression from the dashboard (
>>> https://apache-beam-testing.appspot.com/dashboard-admin)
>>>
>>> On Fri, Oct 4, 2019 at 8:23 AM Mark Liu  wrote:
>>>
 Thanks for the validation work! I validated following:

 - Java Quickstart on direct, dataflow,spark local, flink local runner
 - Java mobile gaming on direct and dataflow runner
 - Python Quickstart in batch and streaming in py2/3.5/3.6/3.7 using
 wheals/zip
 - Python Mobile Game in batch/streaming in py2/3.5/3.6/3.7 using
 wheals/zip on direct and dataflow runner

 Mark

 On Thu, Oct 3, 2019 at 6:57 PM Ahmet Altay  wrote:

> I see most of the release validations have been completed and marked
> in the spreadsheet. Thank you all for doing that. If you have not
> validated/voted yet please take a look at the release candidate.
>
> On Thu, Oct 3, 2019 at 7:59 AM Thomas Weise  wrote:
>
>> I think there is a different reason why the release manager should
>> probably merge/approve all PRs that go into the release branch while the
>> release is in progress:
>>
>> If/when the need arises for another RC, then only those changes
>> should be included that are deemed blockers or explicitly agreed. 
>> Otherwise
>> the release can potentially be delayed by modifications that invalidate
>> prior verification or introduce new instability.
>>
>
> I agree with this reasoning. It expresses my concern in a more clear
> way.
>
>
>>
>> Thomas
>>
>>
>> On Thu, Oct 3, 2019 at 3:12 AM Maximilian Michels 
>> wrote:
>>
>>>  > For the next time, may I suggest asking release manager to do the
>>>  > merging to the release branch. We do not know whether there will
>>> be an
>>>  > RC2 or not. And if there will not be an RC2 release branch as of
>>> now
>>>  > does not directly correspond to what will be released.
>>>
>>> The ground truth for releases are the release tags, not the release
>>> branches. Downstream projects should not depend on the release
>>> branches.
>>> Release branches are merely important for the process of creating a
>>> release, but they lose validity after the RC has been created and
>>> released.
>>>
>>> On 02.10.19 11:45, Ahmet Altay wrote:
>>> > +1 (validated python quickstarts). Thank you Mark.
>>> >
>>> > On Wed, Oct 2, 2019 at 10:49 AM Maximilian Michels >> > > wrote:
>>> >
>>> > Thanks for preparing the release, Mark! I would like to address
>>> > https://issues.apache.org/jira/browse/BEAM-8303 in the
>>> release. I've
>>> > already merged the fix to the release-2.16.0 branch. If we do
>>> another
>>> > RC, we could include it. As a user is blocked on this, I would
>>> not vote
>>> > +1 for this RC, but I also do not want to block the release
>>> process.
>>> >
>>> >
>>> > Max, thank you for the clear communication for the importance and
>>> at the
>>> > same time non-blocking status of the issue.
>>> >
>>> > For the next time, may I suggest asking release manager to do the
>>> > merging to the release branch. We do not know whether there will
>>> be an
>>> > RC2 or not. And if there will not be an RC2 release branch as of
>>> now
>>> > does not directly correspond to what will be released.
>>> >
>>> >
>>> > On 01.10.19 09:18, Mark Liu wrote:
>>> >  > Hi everyone,
>>> >  >
>>> >  > Please review and vote on the release candidate #1 for the
>>> version
>>> >  > 2.16.0, as follows:
>>> >  > [ ] +1, Approve the release
>>> >  > [ ] -1, Do not

Re: [VOTE] Release 2.16.0, release candidate #1

2019-10-04 Thread Valentyn Tymofieiev
I also ran portable wordcount on Spark and Flink runners using
docker container images that we are releasing with 2.16.0. I was using the
SDK code from RC1 tag in github, and pulled the container image from Docker
repo as follows:

git checkout tags/v2.16.0-RC1

./gradlew :runners:spark:job-server:runShadow   or ./gradlew
:runners:flink:1.5:job-server:runShadow

In a separate terminal:

docker pull apachebeam/python3.5_sdk:2.16.0_rc1
docker tag apachebeam/python3.5_sdk:2.16.0_rc1
apachebeam/python3.5_sdk:2.16.0

./gradlew :sdks:python:test-suites:portable:py35:portableWordCountBatch
 -PjobEndpoint=localhost:8099 -PenvirionmentType=DOCKER

As soon as that is done:
docker ps -a  | grep apachebeam
Note the container ID of a running container. It stays around only for a
minute or so, and gets garbage-collected...
docker exec 866fb8932207 /bin/bash -c 'cat /tmp/py-wordcount*'



Still voting +1.

On Fri, Oct 4, 2019 at 12:41 PM Kenneth Knowles  wrote:

> +1 (binding)
>
>  - Reviewed what verifications had been done. Nice.
>  - Also did a gradle build of some targets in the archival source release
>
> The source release still does not build as a whole, as it has not since
> 2.9.0 it seems. It is not as simple as excluding website from the build,
> because it fails at configure time. Since particular artifacts can build,
> it is not a blocker, but I've taken
> https://issues.apache.org/jira/browse/BEAM-6228 and upgraded to critical
> and put 2.17.0 as Release Version.
>
> Kenn
>
> On Fri, Oct 4, 2019 at 10:27 AM Pablo Estrada  wrote:
>
>> Hi all,
>> I looked at https://issues.apache.org/jira/browse/BEAM-8303, and it
>> seems like the user has a workaround - is that correct?
>> If that's the case, then I vote +1.
>>
>> @Max - lmk if you'd like to discuss further, but for now my vote is on +1.
>> Best
>> -P.
>>
>> On Fri, Oct 4, 2019 at 9:29 AM Mark Liu  wrote:
>>
>>> +1 (forgot to vote)
>>>
>>> I also triggered Java Nexmark on direct, dataflow, spark and flink
>>> runner. Didn't saw performance regression from the dashboard (
>>> https://apache-beam-testing.appspot.com/dashboard-admin)
>>>
>>> On Fri, Oct 4, 2019 at 8:23 AM Mark Liu  wrote:
>>>
 Thanks for the validation work! I validated following:

 - Java Quickstart on direct, dataflow,spark local, flink local runner
 - Java mobile gaming on direct and dataflow runner
 - Python Quickstart in batch and streaming in py2/3.5/3.6/3.7 using
 wheals/zip
 - Python Mobile Game in batch/streaming in py2/3.5/3.6/3.7 using
 wheals/zip on direct and dataflow runner

 Mark

 On Thu, Oct 3, 2019 at 6:57 PM Ahmet Altay  wrote:

> I see most of the release validations have been completed and marked
> in the spreadsheet. Thank you all for doing that. If you have not
> validated/voted yet please take a look at the release candidate.
>
> On Thu, Oct 3, 2019 at 7:59 AM Thomas Weise  wrote:
>
>> I think there is a different reason why the release manager should
>> probably merge/approve all PRs that go into the release branch while the
>> release is in progress:
>>
>> If/when the need arises for another RC, then only those changes
>> should be included that are deemed blockers or explicitly agreed. 
>> Otherwise
>> the release can potentially be delayed by modifications that invalidate
>> prior verification or introduce new instability.
>>
>
> I agree with this reasoning. It expresses my concern in a more clear
> way.
>
>
>>
>> Thomas
>>
>>
>> On Thu, Oct 3, 2019 at 3:12 AM Maximilian Michels 
>> wrote:
>>
>>>  > For the next time, may I suggest asking release manager to do the
>>>  > merging to the release branch. We do not know whether there will
>>> be an
>>>  > RC2 or not. And if there will not be an RC2 release branch as of
>>> now
>>>  > does not directly correspond to what will be released.
>>>
>>> The ground truth for releases are the release tags, not the release
>>> branches. Downstream projects should not depend on the release
>>> branches.
>>> Release branches are merely important for the process of creating a
>>> release, but they lose validity after the RC has been created and
>>> released.
>>>
>>> On 02.10.19 11:45, Ahmet Altay wrote:
>>> > +1 (validated python quickstarts). Thank you Mark.
>>> >
>>> > On Wed, Oct 2, 2019 at 10:49 AM Maximilian Michels >> > > wrote:
>>> >
>>> > Thanks for preparing the release, Mark! I would like to address
>>> > https://issues.apache.org/jira/browse/BEAM-8303 in the
>>> release. I've
>>> > already merged the fix to the release-2.16.0 branch. If we do
>>> another
>>> > RC, we could include it. As a user is blocked on this, I would
>>> not vote
>>> > +1 for this RC, but I also do not want to block the release
>>> pro

Re: Introduction to the mailing list

2019-10-04 Thread Ahmet Altay
Welcome Manuela.

You can look at (https://issues.apache.org/jira/browse/BEAM-2855) as the
starting point. It has links to 2 previously merged PRs that can serve as
starting points for writing new nexmark queries. There are nexmark queries
described in (https://cwiki.apache.org/confluence/display/BEAM/Nexmark).
Second step would be picking a few more of those and starting with the
implementation.

+Ismaël Mejía  and I can help with PR reviews and
specific questions as well.

Hope this helps.

Ahmet

On Fri, Oct 4, 2019 at 9:43 AM Thomas Weise  wrote:

> Welcome, Manuela!
>
> For getting familiar with the Beam development environment in general, I
> would recommend to take a look at:
>
> https://beam.apache.org/get-started/quickstart-py/
>
> https://cwiki.apache.org/confluence/display/BEAM/Nexmark
>
> For contributing and collaborating in general, please take a look at:
>
> https://beam.apache.org/contribute/
>
> There is a lot more useful information for contributors on our cwiki:
>
> https://cwiki.apache.org/confluence/display/BEAM/Apache+Beam
>
> And don't hesitate to reach out anytime here on this list with questions.
>
> Thanks,
> Thomas
>
>
> On Fri, Oct 4, 2019 at 4:53 AM Manuela Chamda Tchakoute <
> chamdamanu...@gmail.com> wrote:
>
>> Hello.
>>
>>  My name is Chamda Manuela from the University of Buea, Cameroon. I am
>> new to open source and comfortable with Python programming language. I will
>> like to contribute to the outreachy project "Extend the Nextmark
>> Benchmarking suite in Apache Beam to include python and portable runners".
>>
>> I will be glad if someone could help me with a step by step guide to get
>> started.
>>
>>
>> Thank you.
>>
>


Re: outreachy intern

2019-10-04 Thread Kenneth Knowles
Welcome!

On Fri, Oct 4, 2019 at 10:28 AM Rui Wang  wrote:

> Welcome Diksha!
>
>
> Can you share your username of [1]?
>
> [1]: https://jira.apache.org/jira/secure/Dashboard.jspa
>
> -Rui
>
> On Fri, Oct 4, 2019 at 9:44 AM Thomas Weise  wrote:
>
>> Welcome, Diksha!
>>
>>
>> On Fri, Oct 4, 2019 at 8:47 AM diksha gupta 
>> wrote:
>>
>>> Hi, I am Diksha Gupta, outreachy intern.
>>> I will work with your host on beamSQL.
>>>
>>


Re: [VOTE] Release 2.16.0, release candidate #1

2019-10-04 Thread Kenneth Knowles
+1 (binding)

 - Reviewed what verifications had been done. Nice.
 - Also did a gradle build of some targets in the archival source release

The source release still does not build as a whole, as it has not since
2.9.0 it seems. It is not as simple as excluding website from the build,
because it fails at configure time. Since particular artifacts can build,
it is not a blocker, but I've taken
https://issues.apache.org/jira/browse/BEAM-6228 and upgraded to critical
and put 2.17.0 as Release Version.

Kenn

On Fri, Oct 4, 2019 at 10:27 AM Pablo Estrada  wrote:

> Hi all,
> I looked at https://issues.apache.org/jira/browse/BEAM-8303, and it seems
> like the user has a workaround - is that correct?
> If that's the case, then I vote +1.
>
> @Max - lmk if you'd like to discuss further, but for now my vote is on +1.
> Best
> -P.
>
> On Fri, Oct 4, 2019 at 9:29 AM Mark Liu  wrote:
>
>> +1 (forgot to vote)
>>
>> I also triggered Java Nexmark on direct, dataflow, spark and flink
>> runner. Didn't saw performance regression from the dashboard (
>> https://apache-beam-testing.appspot.com/dashboard-admin)
>>
>> On Fri, Oct 4, 2019 at 8:23 AM Mark Liu  wrote:
>>
>>> Thanks for the validation work! I validated following:
>>>
>>> - Java Quickstart on direct, dataflow,spark local, flink local runner
>>> - Java mobile gaming on direct and dataflow runner
>>> - Python Quickstart in batch and streaming in py2/3.5/3.6/3.7 using
>>> wheals/zip
>>> - Python Mobile Game in batch/streaming in py2/3.5/3.6/3.7 using
>>> wheals/zip on direct and dataflow runner
>>>
>>> Mark
>>>
>>> On Thu, Oct 3, 2019 at 6:57 PM Ahmet Altay  wrote:
>>>
 I see most of the release validations have been completed and marked in
 the spreadsheet. Thank you all for doing that. If you have not
 validated/voted yet please take a look at the release candidate.

 On Thu, Oct 3, 2019 at 7:59 AM Thomas Weise  wrote:

> I think there is a different reason why the release manager should
> probably merge/approve all PRs that go into the release branch while the
> release is in progress:
>
> If/when the need arises for another RC, then only those changes should
> be included that are deemed blockers or explicitly agreed. Otherwise the
> release can potentially be delayed by modifications that invalidate prior
> verification or introduce new instability.
>

 I agree with this reasoning. It expresses my concern in a more clear
 way.


>
> Thomas
>
>
> On Thu, Oct 3, 2019 at 3:12 AM Maximilian Michels 
> wrote:
>
>>  > For the next time, may I suggest asking release manager to do the
>>  > merging to the release branch. We do not know whether there will
>> be an
>>  > RC2 or not. And if there will not be an RC2 release branch as of
>> now
>>  > does not directly correspond to what will be released.
>>
>> The ground truth for releases are the release tags, not the release
>> branches. Downstream projects should not depend on the release
>> branches.
>> Release branches are merely important for the process of creating a
>> release, but they lose validity after the RC has been created and
>> released.
>>
>> On 02.10.19 11:45, Ahmet Altay wrote:
>> > +1 (validated python quickstarts). Thank you Mark.
>> >
>> > On Wed, Oct 2, 2019 at 10:49 AM Maximilian Michels > > > wrote:
>> >
>> > Thanks for preparing the release, Mark! I would like to address
>> > https://issues.apache.org/jira/browse/BEAM-8303 in the
>> release. I've
>> > already merged the fix to the release-2.16.0 branch. If we do
>> another
>> > RC, we could include it. As a user is blocked on this, I would
>> not vote
>> > +1 for this RC, but I also do not want to block the release
>> process.
>> >
>> >
>> > Max, thank you for the clear communication for the importance and
>> at the
>> > same time non-blocking status of the issue.
>> >
>> > For the next time, may I suggest asking release manager to do the
>> > merging to the release branch. We do not know whether there will be
>> an
>> > RC2 or not. And if there will not be an RC2 release branch as of
>> now
>> > does not directly correspond to what will be released.
>> >
>> >
>> > On 01.10.19 09:18, Mark Liu wrote:
>> >  > Hi everyone,
>> >  >
>> >  > Please review and vote on the release candidate #1 for the
>> version
>> >  > 2.16.0, as follows:
>> >  > [ ] +1, Approve the release
>> >  > [ ] -1, Do not approve the release (please provide specific
>> comments)
>> >  >
>> >  >
>> >  > The complete staging area is available for your review, which
>> > includes:
>> >  > * JIRA release notes [1],
>> >  > * the official Apache source relea

Spring with Apache Beam

2019-10-04 Thread Jitendra kumavat
Hi,

I want to add Spring framework in my apache beam project.  Somehow i am
unable to inject the Spring Application context to executing ParDo
functions. I couldn't find the way to do so? Can you please let me know how
to integrate Spring runtime application context with Apache Beam pipeline.

Thanks,
Jitendra


Re: [VOTE] Release 2.16.0, release candidate #1

2019-10-04 Thread Robert Bradshaw
OK, this appears to have been a weird config issue on my system
(though the error certainly could have been better). As BEAM-8303 has
a workaround and all else is looking good, I don't think that's worth
another RC.

+1 (binding) to this release.

On Fri, Oct 4, 2019 at 10:56 AM Robert Bradshaw  wrote:
>
> The artifact signatures and contents all look good to me. I've also
> verify the wheels work for the direct runner. However, I'm having an
> issue with trying to run on dataflow with Python 3.6:
>
> python -m apache_beam.examples.wordcount   --input
> gs://clouddfe-robertwb/chicago_taxi_data/eval/data.csv   --output
> gs://clouddfe-robertwb/test/xcounts.txt   --runner=Dataflow
> --project=google.com:clouddfe
> --temp_location=gs://clouddfe-robertwb/fn-api/tmp
> --staging_location=gs://clouddfe-robertwb/tmp
> --sdk_location=staging/apache-beam-2.16.0.zip
> ...
>   File 
> "/usr/local/google/home/robertwb/beam-release/release-verify/staging/test-venv/lib/python3.6/site-packages/apache_beam/io/gcp/gcsio.py",
> line 374, in exists
> self.client.objects.Get(request)  # metadata
>   File 
> "/usr/local/google/home/robertwb/beam-release/release-verify/staging/test-venv/lib/python3.6/site-packages/apache_beam/io/gcp/internal/clients/storage/storage_v1_client.py",
> line 1100, in Get
> download=download)
>   File 
> "/usr/local/google/home/robertwb/beam-release/release-verify/staging/test-venv/lib/python3.6/site-packages/apitools/base/py/base_api.py",
> line 729, in _RunMethod
> http, http_request, **opts)
>   File 
> "/usr/local/google/home/robertwb/beam-release/release-verify/staging/test-venv/lib/python3.6/site-packages/apitools/base/py/http_wrapper.py",
> line 360, in MakeRequest
> max_retry_wait, total_wait_sec))
>   File 
> "/usr/local/google/home/robertwb/beam-release/release-verify/staging/test-venv/lib/python3.6/site-packages/apache_beam/io/gcp/gcsio_overrides.py",
> line 43, in retry_func
> return http_wrapper.HandleExceptionsAndRebuildHttpConnections(retry_args)
>   File 
> "/usr/local/google/home/robertwb/beam-release/release-verify/staging/test-venv/lib/python3.6/site-packages/apitools/base/py/http_wrapper.py",
> line 294, in HandleExceptionsAndRebuildHttpConnections
> retry_args.exc.status >= 500)):
>
> Is this just me or a wider issue?
>
> On Fri, Oct 4, 2019 at 10:27 AM Pablo Estrada  wrote:
> >
> > Hi all,
> > I looked at https://issues.apache.org/jira/browse/BEAM-8303, and it seems 
> > like the user has a workaround - is that correct?
> > If that's the case, then I vote +1.
> >
> > @Max - lmk if you'd like to discuss further, but for now my vote is on +1.
> > Best
> > -P.
> >
> > On Fri, Oct 4, 2019 at 9:29 AM Mark Liu  wrote:
> >>
> >> +1 (forgot to vote)
> >>
> >> I also triggered Java Nexmark on direct, dataflow, spark and flink runner. 
> >> Didn't saw performance regression from the dashboard 
> >> (https://apache-beam-testing.appspot.com/dashboard-admin)
> >>
> >> On Fri, Oct 4, 2019 at 8:23 AM Mark Liu  wrote:
> >>>
> >>> Thanks for the validation work! I validated following:
> >>>
> >>> - Java Quickstart on direct, dataflow,spark local, flink local runner
> >>> - Java mobile gaming on direct and dataflow runner
> >>> - Python Quickstart in batch and streaming in py2/3.5/3.6/3.7 using 
> >>> wheals/zip
> >>> - Python Mobile Game in batch/streaming in py2/3.5/3.6/3.7 using 
> >>> wheals/zip on direct and dataflow runner
> >>>
> >>> Mark
> >>>
> >>> On Thu, Oct 3, 2019 at 6:57 PM Ahmet Altay  wrote:
> 
>  I see most of the release validations have been completed and marked in 
>  the spreadsheet. Thank you all for doing that. If you have not 
>  validated/voted yet please take a look at the release candidate.
> 
>  On Thu, Oct 3, 2019 at 7:59 AM Thomas Weise  wrote:
> >
> > I think there is a different reason why the release manager should 
> > probably merge/approve all PRs that go into the release branch while 
> > the release is in progress:
> >
> > If/when the need arises for another RC, then only those changes should 
> > be included that are deemed blockers or explicitly agreed. Otherwise 
> > the release can potentially be delayed by modifications that invalidate 
> > prior verification or introduce new instability.
> 
> 
>  I agree with this reasoning. It expresses my concern in a more clear way.
> 
> >
> >
> > Thomas
> >
> >
> > On Thu, Oct 3, 2019 at 3:12 AM Maximilian Michels  
> > wrote:
> >>
> >>  > For the next time, may I suggest asking release manager to do the
> >>  > merging to the release branch. We do not know whether there will be 
> >> an
> >>  > RC2 or not. And if there will not be an RC2 release branch as of now
> >>  > does not directly correspond to what will be released.
> >>
> >> The ground truth for releases are the release tags, not the release
> >> branches. Downstream projects should not depend on

Re: using avro instead of json for BigQueryIO.Write

2019-10-04 Thread Pablo Estrada
Thanks Steve!
I'll take a look next week. Sorry about the delay so far.
Best
-P.

On Fri, Sep 27, 2019 at 10:37 AM Steve Niemitz  wrote:

> I put up a semi-WIP pull request https://github.com/apache/beam/pull/9665 for
> this.  The initial results look good.  I'll spend some time soon adding
> unit tests and documentation, but I'd appreciate it if someone could take a
> first pass over it.
>
> On Wed, Sep 18, 2019 at 6:14 PM Pablo Estrada  wrote:
>
>> Thanks for offering to work on this! It would be awesome to have it. I
>> can say that we don't have that for Python ATM.
>>
>> On Mon, Sep 16, 2019 at 10:56 AM Steve Niemitz 
>> wrote:
>>
>>> Our experience has actually been that avro is more efficient than even
>>> parquet, but that might also be skewed from our datasets.
>>>
>>> I might try to take a crack at this, I found
>>> https://issues.apache.org/jira/browse/BEAM-2879 tracking it (which
>>> coincidentally references my thread from a couple years ago on the read
>>> side of this :) ).
>>>
>>> On Mon, Sep 16, 2019 at 1:38 PM Reuven Lax  wrote:
>>>
 It's been talked about, but nobody's done anything. There as some
 difficulties related to type conversion (json and avro don't support the
 same types), but if those are overcome then an avro version would be much
 more efficient. I believe Parquet files would be even more efficient if you
 wanted to go that path, but there might be more code to write (as we
 already have some code in the codebase to convert between TableRows and
 Avro).

 Reuven

 On Mon, Sep 16, 2019 at 10:33 AM Steve Niemitz 
 wrote:

> Has anyone investigated using avro rather than json to load data into
> BigQuery using BigQueryIO (+ FILE_LOADS)?
>
> I'd be interested in enhancing it to support this, but I'm curious if
> there's any prior work here.
>



Re: Feature addition to java CassandraIO connector

2019-10-04 Thread Pablo Estrada
Hi Vincent!
Do you think you could add some code snippets / pseudocode as to what this
looks like? Feel free to do it on email, gist, google doc, etc?
Best
-P.

On Thu, Oct 3, 2019 at 4:16 PM Vincent Marquez 
wrote:

> Currently the CassandraIO connector allows a user to specify a table, and
> the CassandraSource object generates a list of queries based on token
> ranges of the table, along with grouping them by the token ranges.
>
> I often need to run (generated, sometimes a million+) queries against a
> subset of a table.  Instead of providing a filter, it is easier and much
> more performant to supply a collection of queries along with their tokens
> to both partition and group by, instead of letting CassandraIO naively run
> over the entire table or with a simple filter.
>
> I propose in addition to the current method of supplying a table and
> filter, also allowing the user to pass in a collection of queries and
> tokens.   The current way CassandraSource breaks up the table could be
> modified to build on top of the proposed implementation to reduce code
> duplication as well.  If this sounds like an acceptable alternative way of
> using the CassandraIO connector, I don't mind giving it a shot with a pull
> request.
>
> If there is a better way of doing this, I'm eager to hear and learn.
> Thanks for reading!
>


Re: Plan for dropping python 2 support

2019-10-04 Thread Robert Bradshaw
Thanks for holding this vote. Note that this is a pledge to remove
support sometime in 2020, but no promises as to whether that will be
January or December (though I hope sooner rather than later).

Valentyn, did you want to go ahead and make a PR adding Apache Beam to
the python3statement page?

On Mon, Sep 30, 2019 at 5:10 PM Valentyn Tymofieiev  wrote:
>
> As suggested and enthusiastically supported by several folks in this thread, 
> I will send a vote to sign a pledge on http://python3statement.org on behalf 
> of Apache Beam to discontinue Python 2 support in or before 2020.
>
> The motivation for signing the pledge is:
> - to provide another signal to Beam users, and projects that depend on Beam 
> that Beam Python 2 offering will soon sunset;
> - to facilitate adoption of Python 3 by Beam users, developers, and runner 
> maintainers;
> - to facilitate adoption of Python 3 in wider Python ecosystem.
>
> See also http://python3stament.org for background behind this pledge and the 
> list of projects which have already signed it.
>
> On Mon, Sep 23, 2019 at 4:45 PM Kyle Weaver  wrote:
>>
>> Re feedback collection, we already print a message:
>> "You are using Apache Beam with Python 2. New releases of Apache Beam will 
>> soon support Python 3 only."
>> When users run Python 2 pipelines. This might be a good place to provide 
>> additional info along with a place to send feedback (probably user@). While 
>> I'm sure not everyone out there reads their logs, I imagine this is a sure 
>> and easy way of reaching at least some Python 2 users.
>>
>> Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com
>>
>>
>> On Fri, Sep 20, 2019 at 10:28 AM Valentyn Tymofieiev  
>> wrote:
>>>
>>> Thank you, Chad, for refreshing this conversation and adding the 
>>> perspective of Python 2 users of Beam who have not(yet) completed the 
>>> migration. My thoughts below.
>>>
>>> - It is in the best interest of everyone to ensure a smooth migration for 
>>> Beam users. However a migration needs to happen since Python ecosystem is 
>>> moving off of Python 2.
>>> - Beam has a couple of dozen dependencies, and we cannot have an 
>>> expectation that Python 2 versions of these dependencies will be maintained 
>>> in 2020.
>>> - BEAM-1251 should be closed, since it may communicate a signal that Beam 
>>> does not support Python 3, while it does. Beam has first announced support 
>>> of Python 3 in Beam 2.11.0, admittedly later than many mainstream libraries 
>>> in Python ecosystem.
>>> - I think Python 2 LTS release (if we continue them) may have critical bug 
>>> fixes, but not new features, so we won't be backporting new features.
>>> - Beam portability allows users to customize usercode runtime environment, 
>>> and it should be possible for users to supply a Python 2 SDK harness 
>>> container, should they have no other option. This would require a 
>>> backported user-supplied version of Beam SDK that works on Python 2, 
>>> although such SDK may become difficult/impractical to maintain for most 
>>> users.
>>> - There are several open issues related to Python 3, but they are 
>>> improvements in nature, and we are steadily closing them off. I am not 
>>> aware of any adoption blockers for Beam Python 3, specific to Beam.
>>> - I have not heard of users reports who attempted but were not able to use 
>>> Beam on Python 3.
>>> - This does not mean that our offering is perfect, there may be errors and 
>>> omissions that are yet to be discovered. However, it would be in the best 
>>> interest of the Beam community to discover these issues earlier. A message 
>>> that Beam will discontinue Python 2 support will encourage users to 
>>> migrate, therefore I also support Beam signing https://python3statement.org.
>>> - Having more usage statistics and feedback closer to 2020 can help us be 
>>> more confident in deciding when to stop Python 2 support.
>>>
>>> On Thu, Sep 19, 2019 at 6:05 PM Ahmet Altay  wrote:

 Thanks a lot for sharing your thoughts, I completely agree that we need to 
 minimize the burden on our users as much as possible. Especially in this 
 case when we are offering a robust python 3 solution just now. However I 
 do share the same concerns related to dependencies and tool chains, It 
 will be increasingly difficult for us to keep our code base compatible 
 with python2 and python3 overtime. (To be very explicit, one of those 
 dependencies is Dataflow's python pre-portability workers.)

 On Thu, Sep 19, 2019 at 5:17 PM Maximilian Michels  wrote:
>
> Granted that we just have finalized the Python 3 support, we should
> allow time for it to mature and for users to make the switch.
>
> > Oh, and one more thing, I think it'd make sense for Apache Beam to
> > sign https://python3statement.org/. The promise is that we'd
> > discontinue Python 2 support *in* 2020, which is not committing us to
> > January if we're not ready

Re: Multiple iterations after GroupByKey with SparkRunner

2019-10-04 Thread Kenneth Knowles
The DoFnSignature is where the information "this ParDo only needs a
oneshot" would be recorded. This is what enables a runner to use the
GBKOneShot in place of a full GBK.

Kenn

On Fri, Oct 4, 2019 at 1:13 AM Reuven Lax  wrote:

> Yes - this approach puts compatibility checking on the user. However we
> could provide another way for the ParDo to "advertise" the set of states it
> will access. This is similar to what Kenn proposed: today there is a
> DoFnSignature object that is inferred
> reflectively based on annotations. However if there were an API to modify
> the DoFnSignature, then a DSL can simply use that API to list a set of
> state readers.
>
> Reuven
>
>
>
> On Fri, Oct 4, 2019 at 1:03 AM Jan Lukavský  wrote:
>
>> +1
>>
>> But I'd warn a little against this kind of absolute freedom for the
>> process() method. It should probably remain that all states will be created
>> before any element passes in, because otherwise it would be hard (if not
>> impossible) to do any compatibility checking of state upon pipeline
>> upgrades.
>>
>> Jan
>> On 10/4/19 9:47 AM, Reuven Lax wrote:
>>
>> IMO the fact that Stateful ParDo requires compile-time annotations isn't
>> the biggest problem - it's that it requires a static set of them, one for
>> each state. This is fine for specific user code, but we really should add
>> the ability to pass in a StateAccessor object to a DoFn that allows the
>> DoFn to dynamically create different state objects. Something like the
>> following:
>>
>> public void process(StateAccessor stateAccessor, ...) {
>>stateAccessor.getValueState("state1", TypeDescriptors.ints()).get();
>>stateAccessor.getMapState("state2', TypeDescriptors.strings(),
>> TypeDescriptors.ints()).put();
>>etc.
>> }
>>
>> This would be a bit less type safe than the current approach (someone
>> could try and fetch the same state twice with different types). However it
>> would be much friendlier to DSLs, and indeed any "generic" PTransform that
>> does not statically know all of its states at compile time.
>>
>> I think we need similar functionality for timers.
>>
>> Reuven
>>
>> On Fri, Oct 4, 2019 at 12:36 AM Jan Lukavský  wrote:
>>
>>> > So to me the interesting part is that there is a DSL that wants to
>>> support primitives that are strictly weaker than Beam's, in order to *only*
>>> allow the oneshot path. Annotations are quite annoying for DSLs, as you may
>>> have noticed for state & timers, so that is not a good fit. But the
>>> concepts still work. I would suggest pivot this thread into how to allow a
>>> DSL builder to directly provide a DoFnInvoke with DoFnSignature in order to
>>> programmatically provide the same information that annotations are used.
>>> Essentially exposing an IR to DSL authors rather than forcing them to work
>>> with the source language meant for end users. Do you already have a
>>> solution for this today?
>>>
>>> We have been talking about that this would be useful - it is mostly due
>>> to the fact that stateful ParDo requires annotations (compile time) why
>>> Euphoria lacks stateful processing support. For that, we need exactly what
>>> you say, we need to be able to provide runner directly with DoFnSignature.
>>> Other solutions would be kind of hackish.
>>>
>>> On the other hand, this isn't directly related to the discussion about
>>> reiterations in GBK, is it? I think DoFnSignatures cannot help us here,
>>> because we need to affect the way GBK is translated in runner, not the
>>> ParDo. So it quite naturally leads to the RBK, or "streamed GBK". If we
>>> have a consensus on that, I can create JIRAs and move it forward.
>>>
>>> Jan
>>> On 10/3/19 7:19 PM, Kenneth Knowles wrote:
>>>
>>> On Tue, Oct 1, 2019 at 5:35 PM Robert Bradshaw 
>>> wrote:
>>>
 For this specific usecase, I would suggest this be done via
 PTranform URNs. E.g. one could have a GroupByKeyOneShot whose
 implementation is

 input
 .apply(GroupByKey.of()
 .apply(kv -> KV.of(kv.key(), kv.iterator())

>>>
>>> This is dual to what I clumsily was trying to say in my last paragraph.
>>> But I agree that ReduceByKey is better, if we were to add any new primitive
>>> transform. I very much dislike PCollection for just the reasons
>>> you also mention.
>>>
>>> I think the annotation route where @ProcessElement can accept a
>>> different type of element seems less intrusive and more flexible.
>>>
>>>
 On Tue, Oct 1, 2019 at 2:16 AM Jan Lukavský  wrote:

> The car analogy was meant to say, that in real world you have to make
> decision before you take any action. There is no retroactivity possible.
>
 Reuven pointed out, that it is possible (although it seems a little
> weird to me, but that is the only thing I can tell against it :-)), that
> the way a grouped PCollection is produced might be out of control of a
> consuming operator. One example of this might be, that the grouping is
> produced in a submodule (some librar

Re: [VOTE] Release 2.16.0, release candidate #1

2019-10-04 Thread Robert Bradshaw
The artifact signatures and contents all look good to me. I've also
verify the wheels work for the direct runner. However, I'm having an
issue with trying to run on dataflow with Python 3.6:

python -m apache_beam.examples.wordcount   --input
gs://clouddfe-robertwb/chicago_taxi_data/eval/data.csv   --output
gs://clouddfe-robertwb/test/xcounts.txt   --runner=Dataflow
--project=google.com:clouddfe
--temp_location=gs://clouddfe-robertwb/fn-api/tmp
--staging_location=gs://clouddfe-robertwb/tmp
--sdk_location=staging/apache-beam-2.16.0.zip
...
  File 
"/usr/local/google/home/robertwb/beam-release/release-verify/staging/test-venv/lib/python3.6/site-packages/apache_beam/io/gcp/gcsio.py",
line 374, in exists
self.client.objects.Get(request)  # metadata
  File 
"/usr/local/google/home/robertwb/beam-release/release-verify/staging/test-venv/lib/python3.6/site-packages/apache_beam/io/gcp/internal/clients/storage/storage_v1_client.py",
line 1100, in Get
download=download)
  File 
"/usr/local/google/home/robertwb/beam-release/release-verify/staging/test-venv/lib/python3.6/site-packages/apitools/base/py/base_api.py",
line 729, in _RunMethod
http, http_request, **opts)
  File 
"/usr/local/google/home/robertwb/beam-release/release-verify/staging/test-venv/lib/python3.6/site-packages/apitools/base/py/http_wrapper.py",
line 360, in MakeRequest
max_retry_wait, total_wait_sec))
  File 
"/usr/local/google/home/robertwb/beam-release/release-verify/staging/test-venv/lib/python3.6/site-packages/apache_beam/io/gcp/gcsio_overrides.py",
line 43, in retry_func
return http_wrapper.HandleExceptionsAndRebuildHttpConnections(retry_args)
  File 
"/usr/local/google/home/robertwb/beam-release/release-verify/staging/test-venv/lib/python3.6/site-packages/apitools/base/py/http_wrapper.py",
line 294, in HandleExceptionsAndRebuildHttpConnections
retry_args.exc.status >= 500)):

Is this just me or a wider issue?

On Fri, Oct 4, 2019 at 10:27 AM Pablo Estrada  wrote:
>
> Hi all,
> I looked at https://issues.apache.org/jira/browse/BEAM-8303, and it seems 
> like the user has a workaround - is that correct?
> If that's the case, then I vote +1.
>
> @Max - lmk if you'd like to discuss further, but for now my vote is on +1.
> Best
> -P.
>
> On Fri, Oct 4, 2019 at 9:29 AM Mark Liu  wrote:
>>
>> +1 (forgot to vote)
>>
>> I also triggered Java Nexmark on direct, dataflow, spark and flink runner. 
>> Didn't saw performance regression from the dashboard 
>> (https://apache-beam-testing.appspot.com/dashboard-admin)
>>
>> On Fri, Oct 4, 2019 at 8:23 AM Mark Liu  wrote:
>>>
>>> Thanks for the validation work! I validated following:
>>>
>>> - Java Quickstart on direct, dataflow,spark local, flink local runner
>>> - Java mobile gaming on direct and dataflow runner
>>> - Python Quickstart in batch and streaming in py2/3.5/3.6/3.7 using 
>>> wheals/zip
>>> - Python Mobile Game in batch/streaming in py2/3.5/3.6/3.7 using wheals/zip 
>>> on direct and dataflow runner
>>>
>>> Mark
>>>
>>> On Thu, Oct 3, 2019 at 6:57 PM Ahmet Altay  wrote:

 I see most of the release validations have been completed and marked in 
 the spreadsheet. Thank you all for doing that. If you have not 
 validated/voted yet please take a look at the release candidate.

 On Thu, Oct 3, 2019 at 7:59 AM Thomas Weise  wrote:
>
> I think there is a different reason why the release manager should 
> probably merge/approve all PRs that go into the release branch while the 
> release is in progress:
>
> If/when the need arises for another RC, then only those changes should be 
> included that are deemed blockers or explicitly agreed. Otherwise the 
> release can potentially be delayed by modifications that invalidate prior 
> verification or introduce new instability.


 I agree with this reasoning. It expresses my concern in a more clear way.

>
>
> Thomas
>
>
> On Thu, Oct 3, 2019 at 3:12 AM Maximilian Michels  wrote:
>>
>>  > For the next time, may I suggest asking release manager to do the
>>  > merging to the release branch. We do not know whether there will be an
>>  > RC2 or not. And if there will not be an RC2 release branch as of now
>>  > does not directly correspond to what will be released.
>>
>> The ground truth for releases are the release tags, not the release
>> branches. Downstream projects should not depend on the release branches.
>> Release branches are merely important for the process of creating a
>> release, but they lose validity after the RC has been created and 
>> released.
>>
>> On 02.10.19 11:45, Ahmet Altay wrote:
>> > +1 (validated python quickstarts). Thank you Mark.
>> >
>> > On Wed, Oct 2, 2019 at 10:49 AM Maximilian Michels > > > wrote:
>> >
>> > Thanks for preparing the release, Mark! I would like to address
>> > https://issues.apache.

Re: NOTICE: New Python PreCommit jobs

2019-10-04 Thread Chad Dombrova
> I have a WiP PR to convert Beam to use pytest, but it's been stalled.
>

What would it take to get it back on track?


> Another nice thing about pytest is that you'll be able to tell which suite
> a test belongs to.
>

pytest has a lot of quality of life improvements over nose.  The biggest
and simplest one is that the test name that it prints is in the same format
as the runner expects for specifying individual tests to run, so you can
just copy and paste on the command line to run that one test.  Genius.
Also, since it uses directory names for tests and not module names, you can
tab complete.   The whole fixture concept is also great, since it gives you
a new axis for test composability and reuse, instead of just complex
sub-classing or copy-and-paste.   After switching to pytest we went through
our tests and replaced all of our horrible test mixins with fixtures and
the end result is much more legible and maintainable.  There's honestly
nothing I miss about nose.

-chad


Re: [VOTE] Sign a pledge to discontinue support of Python 2 in 2020.

2019-10-04 Thread Valentyn Tymofieiev
I also vote +1 and conclude the vote.

There are 23 approving votes, 6 of which come from Apache Beam PMC, and
there are no disapproving votes.

Thanks everyone.

On Wed, Oct 2, 2019 at 1:09 AM Mikhail Gryzykhin <
gryzykhin.mikh...@gmail.com> wrote:

> +1
>
> On Tue, Oct 1, 2019 at 6:24 PM Ankur Goenka  wrote:
>
>> +1
>>
>> On Tue, Oct 1, 2019 at 4:27 PM Ruoyun Huang  wrote:
>>
>>> +1
>>>
>>> On Tue, Oct 1, 2019 at 3:52 PM Rui Wang  wrote:
>>>
 +1

 I needed to use https://python3statement.org to access the website BTW
 (https, not http).


 -Rui

 On Tue, Oct 1, 2019 at 3:29 PM Cam Mach  wrote:

> +1
>
>
>
> On Tue, Oct 1, 2019 at 9:44 AM Udi Meiri  wrote:
>
>> +1
>>
>> On Tue, Oct 1, 2019 at 3:22 AM Łukasz Gajowy 
>> wrote:
>>
>>> +1
>>>
>>> wt., 1 paź 2019 o 11:29 Maximilian Michels 
>>> napisał(a):
>>>
 +1

 On 30.09.19 23:03, Reza Rokni wrote:
 > +1
 >
 > On Tue, 1 Oct 2019 at 13:54, Tanay Tummalapalli <
 ttanay...@gmail.com
 > > wrote:
 >
 > +1
 >
 > On Tue, Oct 1, 2019 at 8:19 AM Suneel Marthi <
 smar...@apache.org
 > > wrote:
 >
 > +1
 >
 > On Mon, Sep 30, 2019 at 10:33 PM Manu Zhang
 > mailto:owenzhang1...@gmail.com>>
 wrote:
 >
 > +1
 >
 > On Tue, Oct 1, 2019 at 9:44 AM Austin Bennett
 > >>> > > wrote:
 >
 > +1
 >
 > On Mon, Sep 30, 2019 at 5:22 PM Valentyn
 Tymofieiev
 > mailto:valen...@google.com>>
 wrote:
 >
 > Hi everyone,
 >
 > Please vote whether to sign a pledge on
 behalf of
 > Apache Beam to sunset Beam Python 2 offering
 (in new
 > releases) in 2020 on
 http://python3stament.org as
 > follows:
 >
 > [ ] +1: Sign a pledge to discontinue support
 of
 > Python 2 in Beam in 2020.
 > [ ] -1: Do not sign a pledge to discontinue
 support
 > of Python 2 in Beam in 2020.
 >
 > The motivation and details for this vote were
 > discussed in [1, 2]. Please follow up in [2]
 if you
 > have any questions.
 >
 > This is a procedural vote [3] that will
 follow the
 > majority approval rules and will be open for
 at
 > least 72 hours.
 >
 > Thanks,
 > Valentyn
 >
 > [1]
 >
 https://lists.apache.org/thread.html/eba6caa58ea79a7ecbc8560d1c680a366b44c531d96ce5c699d41535@%3Cdev.beam.apache.org%3E
 > [2]
 >
 https://lists.apache.org/thread.html/456631fe1a696c537ef8ebfee42cd3ea8121bf7c639c52da5f7032e7@%3Cdev.beam.apache.org%3E
 > [3]
 https://www.apache.org/foundation/voting.html
 >
 >
 >
 > --
 >
 > This email may be confidential and privileged. If you received
 this
 > communication by mistake, please don't forward it to anyone else,
 please
 > erase all copies and attachments, and please let me know that it
 has
 > gone to the wrong person.
 >
 > The above terms reflect a potential business arrangement, are
 provided
 > solely as a basis for further discussion, and are not intended to
 be and
 > do not constitute a legally binding obligation. No legally
 binding
 > obligations will be created, implied, or inferred until an
 agreement in
 > final form is executed in writing by all parties involved.
 >

>>>
>>>
>>> --
>>> 
>>> Ruoyun  Huang
>>>
>>>


Re: outreachy intern

2019-10-04 Thread Rui Wang
Welcome Diksha!


Can you share your username of [1]?

[1]: https://jira.apache.org/jira/secure/Dashboard.jspa

-Rui

On Fri, Oct 4, 2019 at 9:44 AM Thomas Weise  wrote:

> Welcome, Diksha!
>
>
> On Fri, Oct 4, 2019 at 8:47 AM diksha gupta 
> wrote:
>
>> Hi, I am Diksha Gupta, outreachy intern.
>> I will work with your host on beamSQL.
>>
>


Re: [VOTE] Release 2.16.0, release candidate #1

2019-10-04 Thread Pablo Estrada
Hi all,
I looked at https://issues.apache.org/jira/browse/BEAM-8303, and it seems
like the user has a workaround - is that correct?
If that's the case, then I vote +1.

@Max - lmk if you'd like to discuss further, but for now my vote is on +1.
Best
-P.

On Fri, Oct 4, 2019 at 9:29 AM Mark Liu  wrote:

> +1 (forgot to vote)
>
> I also triggered Java Nexmark on direct, dataflow, spark and flink runner.
> Didn't saw performance regression from the dashboard (
> https://apache-beam-testing.appspot.com/dashboard-admin)
>
> On Fri, Oct 4, 2019 at 8:23 AM Mark Liu  wrote:
>
>> Thanks for the validation work! I validated following:
>>
>> - Java Quickstart on direct, dataflow,spark local, flink local runner
>> - Java mobile gaming on direct and dataflow runner
>> - Python Quickstart in batch and streaming in py2/3.5/3.6/3.7 using
>> wheals/zip
>> - Python Mobile Game in batch/streaming in py2/3.5/3.6/3.7 using
>> wheals/zip on direct and dataflow runner
>>
>> Mark
>>
>> On Thu, Oct 3, 2019 at 6:57 PM Ahmet Altay  wrote:
>>
>>> I see most of the release validations have been completed and marked in
>>> the spreadsheet. Thank you all for doing that. If you have not
>>> validated/voted yet please take a look at the release candidate.
>>>
>>> On Thu, Oct 3, 2019 at 7:59 AM Thomas Weise  wrote:
>>>
 I think there is a different reason why the release manager should
 probably merge/approve all PRs that go into the release branch while the
 release is in progress:

 If/when the need arises for another RC, then only those changes should
 be included that are deemed blockers or explicitly agreed. Otherwise the
 release can potentially be delayed by modifications that invalidate prior
 verification or introduce new instability.

>>>
>>> I agree with this reasoning. It expresses my concern in a more clear
>>> way.
>>>
>>>

 Thomas


 On Thu, Oct 3, 2019 at 3:12 AM Maximilian Michels 
 wrote:

>  > For the next time, may I suggest asking release manager to do the
>  > merging to the release branch. We do not know whether there will be
> an
>  > RC2 or not. And if there will not be an RC2 release branch as of now
>  > does not directly correspond to what will be released.
>
> The ground truth for releases are the release tags, not the release
> branches. Downstream projects should not depend on the release
> branches.
> Release branches are merely important for the process of creating a
> release, but they lose validity after the RC has been created and
> released.
>
> On 02.10.19 11:45, Ahmet Altay wrote:
> > +1 (validated python quickstarts). Thank you Mark.
> >
> > On Wed, Oct 2, 2019 at 10:49 AM Maximilian Michels  > > wrote:
> >
> > Thanks for preparing the release, Mark! I would like to address
> > https://issues.apache.org/jira/browse/BEAM-8303 in the release.
> I've
> > already merged the fix to the release-2.16.0 branch. If we do
> another
> > RC, we could include it. As a user is blocked on this, I would
> not vote
> > +1 for this RC, but I also do not want to block the release
> process.
> >
> >
> > Max, thank you for the clear communication for the importance and at
> the
> > same time non-blocking status of the issue.
> >
> > For the next time, may I suggest asking release manager to do the
> > merging to the release branch. We do not know whether there will be
> an
> > RC2 or not. And if there will not be an RC2 release branch as of now
> > does not directly correspond to what will be released.
> >
> >
> > On 01.10.19 09:18, Mark Liu wrote:
> >  > Hi everyone,
> >  >
> >  > Please review and vote on the release candidate #1 for the
> version
> >  > 2.16.0, as follows:
> >  > [ ] +1, Approve the release
> >  > [ ] -1, Do not approve the release (please provide specific
> comments)
> >  >
> >  >
> >  > The complete staging area is available for your review, which
> > includes:
> >  > * JIRA release notes [1],
> >  > * the official Apache source release to be deployed to
> > dist.apache.org 
> >  >  [2], which is signed with the key
> with
> >  > fingerprint C110B1C82074883A4241D977599D6305FF3ABB32 [3],
> >  > * all artifacts to be deployed to the Maven Central
> Repository [4],
> >  > * source code tag ""v2.16.0-RC1" [5],
> >  > * website pull request listing the release [6], publishing
> the API
> >  > reference manual [7], and the blog post [8].
> >  > * Python artifacts are deployed along with the source release
> to the
> >  > dist.apache.org  <
> http://dist.apache.org>
>>>

ApacheCon Europe 2019 talks which are relevant to Apache Beam

2019-10-04 Thread myrle

Dear Apache Beam committers,

In a little over 2 weeks time, ApacheCon Europe is taking place in 
Berlin. Join us from October 22 to 24 for an exciting program and lovely 
get-together of the Apache Community.


We are also planning a hackathon.  If your project is interested in 
participating, please enter yourselves here: 
https://cwiki.apache.org/confluence/display/COMDEV/Hackathon


The following talks should be especially relevant for you:

 * 
*https://aceu19.apachecon.com/session/apache-beam-running-big-data-pipelines-python-and-go-spark*
 * 
https://aceu19.apachecon.com/session/patterns-and-anti-patterns-running-apache-bigdata-projects-kubernetes
   
 * https://aceu19.apachecon.com/session/fast-federated-sql-apache-calcite
   

 * 
https://aceu19.apachecon.com/session/open-source-big-data-tools-accelerating-physics-research-cern
   

 * https://aceu19.apachecon.com/session/ui-dev-big-data-world-using-open-source
   

 * 
https://aceu19.apachecon.com/session/apache-hivemall-meets-pyspark-scalable-machine-learning-hive-spark-and-python
 * 
*

   **

   *
   
*

   
   
**https://aceu19.apachecon.com/session/running-facial-recognition-edge-apache-nifi-minifi**

   **

   *

Furthermore there will be a whole conference track on community topics: 
Learn how to motivate users to contribute patches, how the board of 
directors works, how to navigate the Incubator and much more: ApacheCon 
Europe 2019 Community track 


Tickets are available here  – 
for Apache Committers we offer discounted tickets.  Prices will be going 
up on October 7th, so book soon.


Please also help spread the word and make ApacheCon Europe 2019 a success!

We’re looking forward to welcoming you at #ACEU19!

Best,

Your ApacheCon team



Re: outreachy intern

2019-10-04 Thread Thomas Weise
Welcome, Diksha!


On Fri, Oct 4, 2019 at 8:47 AM diksha gupta 
wrote:

> Hi, I am Diksha Gupta, outreachy intern.
> I will work with your host on beamSQL.
>


Re: NOTICE: New Python PreCommit jobs

2019-10-04 Thread Udi Meiri
I have a WiP PR to convert Beam to use pytest, but it's been stalled.
The nice thing about pytest-xdist is that it runs tests in a multi-process,
single-thread-per-process fashion, so one test isn't affected by another
changing some global setting.
The not-so-nice thing is that xdist adds some globals to the main session
that fail to pickle, so I'm having to remove save_main_session from our
tests first.

Another nice thing about pytest is that you'll be able to tell which suite
a test belongs to.

On Wed, Oct 2, 2019 at 10:16 AM Chad Dombrova  wrote:

> Hi all,
> I've posted a new PR that just splits out the python lint job here:
> https://github.com/apache/beam/pull/9706
>
> I'll be running the seed job shortly unless anyone objects.
>
> -chad
>
>
> On Tue, Oct 1, 2019 at 9:04 PM Chad Dombrova  wrote:
>
>> I haven’t used nose’s parallel execution plugin, but I have used pytest
>> with xdist with success. If your tests are designed to run in any order and
>> are properly sandboxed to prevent crosstalk between concurrent runs, which
>> they *should* be, then in my experience it works very well.
>>
>>
>> On Fri, Sep 27, 2019 at 6:51 PM Kenneth Knowles  wrote:
>>
>>> Do things go wrong when nose is configured to use parallel execution?
>>>
>>> On Fri, Sep 27, 2019 at 5:09 PM Chad Dombrova  wrote:
>>>
 By the way, the outcome on this was that splitting the python precommit
 job into one job per python version resulted in increasing the total test
 completion time by 66%, which is obviously not good.  This is because we
 are using Gradle to run the python tests tasks in parallel (the jenkins VMs
 have 16 cores each, utilized across 2 slots, IIRC), but after the split
 there were only 1-2 gradle tasks per test.  Since the python test runner,
 nose, is currently not using parallel execution, there were not enough
 concurrent tasks to make proper use of the VM's CPUs.

 tl;dr  I'm going to create a followup PR to split out just the Lint job
 (same as we have Spotless for Java).   This is our best ROI for now.

 -chad


 On Fri, Sep 27, 2019 at 3:27 PM Kyle Weaver 
 wrote:

> > Do we have good pypi caching?
>
> Building Python SDK harness containers takes 2 mins each (times 4, the
> number of versions) on my machine, even if nothing has changed. But we're
> already paying that cost, so I don't think splitting the jobs should make
> it any worse. (https://issues.apache.org/jira/browse/BEAM-8277 if
> anyone has any ideas)
>
> Kyle Weaver | Software Engineer | github.com/ibzib |
> kcwea...@google.com
>
>
> On Wed, Sep 25, 2019 at 11:21 AM Pablo Estrada 
> wrote:
>
>> Thanks Chad, and thank you for notifying on the dev list.
>>
>> On Wed, Sep 25, 2019 at 10:59 AM Kenneth Knowles 
>> wrote:
>>
>>> Nice.
>>>
>>> Do we have good pypi caching? If not this could add a lot of
>>> overhead to our already-backed-up CI queue. (btw I still think your 
>>> change
>>> is good, and just makes proper caching more important)
>>>
>>> Kenn
>>>
>>> On Tue, Sep 24, 2019 at 9:55 PM Chad Dombrova 
>>> wrote:
>>>
 Hi all,
 I'm working to make the CI experience with python a bit better, and
 my current initiative is splitting up the giant Python PreCommit job 
 into 5
 separate jobs into separate jobs for Lint, Py2, Py3.5, Py3.6, and 
 Py3.7.

 Around 11am Pacific time tomorrow I'm going to initiate the seed
 jobs, at which point all PRs will start to run the new precommit jobs.
 It's a bit of a chicken-and-egg scenario with testing this, so there 
 could
 be issues that pop up after the seed jobs are created, but I'll be 
 working
 to resolve those issues as quickly as possible.

 If you run into problems because of this change, please let me know
 on the github PR.

 Here's the PR: https://github.com/apache/beam/pull/9642
 Here's the Jira: https://issues.apache.org/jira/browse/BEAM-8213#

 The upshot is that after this is done you'll get better feedback on
 python test failures!

 Let me know if you have any concerns.

 thanks,
 chad




smime.p7s
Description: S/MIME Cryptographic Signature


Re: Introduction to the mailing list

2019-10-04 Thread Thomas Weise
Welcome, Manuela!

For getting familiar with the Beam development environment in general, I
would recommend to take a look at:

https://beam.apache.org/get-started/quickstart-py/

https://cwiki.apache.org/confluence/display/BEAM/Nexmark

For contributing and collaborating in general, please take a look at:

https://beam.apache.org/contribute/

There is a lot more useful information for contributors on our cwiki:

https://cwiki.apache.org/confluence/display/BEAM/Apache+Beam

And don't hesitate to reach out anytime here on this list with questions.

Thanks,
Thomas


On Fri, Oct 4, 2019 at 4:53 AM Manuela Chamda Tchakoute <
chamdamanu...@gmail.com> wrote:

> Hello.
>
>  My name is Chamda Manuela from the University of Buea, Cameroon. I am new
> to open source and comfortable with Python programming language. I will
> like to contribute to the outreachy project "Extend the Nextmark
> Benchmarking suite in Apache Beam to include python and portable runners".
>
> I will be glad if someone could help me with a step by step guide to get
> started.
>
>
> Thank you.
>


Re: [VOTE] Release 2.16.0, release candidate #1

2019-10-04 Thread Mark Liu
+1 (forgot to vote)

I also triggered Java Nexmark on direct, dataflow, spark and flink runner.
Didn't saw performance regression from the dashboard (
https://apache-beam-testing.appspot.com/dashboard-admin)

On Fri, Oct 4, 2019 at 8:23 AM Mark Liu  wrote:

> Thanks for the validation work! I validated following:
>
> - Java Quickstart on direct, dataflow,spark local, flink local runner
> - Java mobile gaming on direct and dataflow runner
> - Python Quickstart in batch and streaming in py2/3.5/3.6/3.7 using
> wheals/zip
> - Python Mobile Game in batch/streaming in py2/3.5/3.6/3.7 using
> wheals/zip on direct and dataflow runner
>
> Mark
>
> On Thu, Oct 3, 2019 at 6:57 PM Ahmet Altay  wrote:
>
>> I see most of the release validations have been completed and marked in
>> the spreadsheet. Thank you all for doing that. If you have not
>> validated/voted yet please take a look at the release candidate.
>>
>> On Thu, Oct 3, 2019 at 7:59 AM Thomas Weise  wrote:
>>
>>> I think there is a different reason why the release manager should
>>> probably merge/approve all PRs that go into the release branch while the
>>> release is in progress:
>>>
>>> If/when the need arises for another RC, then only those changes should
>>> be included that are deemed blockers or explicitly agreed. Otherwise the
>>> release can potentially be delayed by modifications that invalidate prior
>>> verification or introduce new instability.
>>>
>>
>> I agree with this reasoning. It expresses my concern in a more clear way.
>>
>>
>>>
>>> Thomas
>>>
>>>
>>> On Thu, Oct 3, 2019 at 3:12 AM Maximilian Michels 
>>> wrote:
>>>
  > For the next time, may I suggest asking release manager to do the
  > merging to the release branch. We do not know whether there will be
 an
  > RC2 or not. And if there will not be an RC2 release branch as of now
  > does not directly correspond to what will be released.

 The ground truth for releases are the release tags, not the release
 branches. Downstream projects should not depend on the release
 branches.
 Release branches are merely important for the process of creating a
 release, but they lose validity after the RC has been created and
 released.

 On 02.10.19 11:45, Ahmet Altay wrote:
 > +1 (validated python quickstarts). Thank you Mark.
 >
 > On Wed, Oct 2, 2019 at 10:49 AM Maximilian Michels >>> > > wrote:
 >
 > Thanks for preparing the release, Mark! I would like to address
 > https://issues.apache.org/jira/browse/BEAM-8303 in the release.
 I've
 > already merged the fix to the release-2.16.0 branch. If we do
 another
 > RC, we could include it. As a user is blocked on this, I would
 not vote
 > +1 for this RC, but I also do not want to block the release
 process.
 >
 >
 > Max, thank you for the clear communication for the importance and at
 the
 > same time non-blocking status of the issue.
 >
 > For the next time, may I suggest asking release manager to do the
 > merging to the release branch. We do not know whether there will be
 an
 > RC2 or not. And if there will not be an RC2 release branch as of now
 > does not directly correspond to what will be released.
 >
 >
 > On 01.10.19 09:18, Mark Liu wrote:
 >  > Hi everyone,
 >  >
 >  > Please review and vote on the release candidate #1 for the
 version
 >  > 2.16.0, as follows:
 >  > [ ] +1, Approve the release
 >  > [ ] -1, Do not approve the release (please provide specific
 comments)
 >  >
 >  >
 >  > The complete staging area is available for your review, which
 > includes:
 >  > * JIRA release notes [1],
 >  > * the official Apache source release to be deployed to
 > dist.apache.org 
 >  >  [2], which is signed with the key
 with
 >  > fingerprint C110B1C82074883A4241D977599D6305FF3ABB32 [3],
 >  > * all artifacts to be deployed to the Maven Central Repository
 [4],
 >  > * source code tag ""v2.16.0-RC1" [5],
 >  > * website pull request listing the release [6], publishing the
 API
 >  > reference manual [7], and the blog post [8].
 >  > * Python artifacts are deployed along with the source release
 to the
 >  > dist.apache.org  <
 http://dist.apache.org>
 > [2].
 >  > * Validation sheet with a tab for 2.16.0 release to help with
 > validation
 >  > [9].
 >  > * Docker images published to Docker Hub [10].
 >  >
 >  > The vote will be open for at least 72 hours. It is adopted by
 > majority
 >  > approval, with at least 3 PMC affirmative votes.
 >  >
 >  > Thanks,
 >  > Mark Liu, Releas

outreachy intern

2019-10-04 Thread diksha gupta
Hi, I am Diksha Gupta, outreachy intern.
I will work with your host on beamSQL.


Re: [VOTE] Release 2.16.0, release candidate #1

2019-10-04 Thread Mark Liu
Thanks for the validation work! I validated following:

- Java Quickstart on direct, dataflow,spark local, flink local runner
- Java mobile gaming on direct and dataflow runner
- Python Quickstart in batch and streaming in py2/3.5/3.6/3.7 using
wheals/zip
- Python Mobile Game in batch/streaming in py2/3.5/3.6/3.7 using wheals/zip
on direct and dataflow runner

Mark

On Thu, Oct 3, 2019 at 6:57 PM Ahmet Altay  wrote:

> I see most of the release validations have been completed and marked in
> the spreadsheet. Thank you all for doing that. If you have not
> validated/voted yet please take a look at the release candidate.
>
> On Thu, Oct 3, 2019 at 7:59 AM Thomas Weise  wrote:
>
>> I think there is a different reason why the release manager should
>> probably merge/approve all PRs that go into the release branch while the
>> release is in progress:
>>
>> If/when the need arises for another RC, then only those changes should be
>> included that are deemed blockers or explicitly agreed. Otherwise the
>> release can potentially be delayed by modifications that invalidate prior
>> verification or introduce new instability.
>>
>
> I agree with this reasoning. It expresses my concern in a more clear way.
>
>
>>
>> Thomas
>>
>>
>> On Thu, Oct 3, 2019 at 3:12 AM Maximilian Michels  wrote:
>>
>>>  > For the next time, may I suggest asking release manager to do the
>>>  > merging to the release branch. We do not know whether there will be an
>>>  > RC2 or not. And if there will not be an RC2 release branch as of now
>>>  > does not directly correspond to what will be released.
>>>
>>> The ground truth for releases are the release tags, not the release
>>> branches. Downstream projects should not depend on the release branches.
>>> Release branches are merely important for the process of creating a
>>> release, but they lose validity after the RC has been created and
>>> released.
>>>
>>> On 02.10.19 11:45, Ahmet Altay wrote:
>>> > +1 (validated python quickstarts). Thank you Mark.
>>> >
>>> > On Wed, Oct 2, 2019 at 10:49 AM Maximilian Michels >> > > wrote:
>>> >
>>> > Thanks for preparing the release, Mark! I would like to address
>>> > https://issues.apache.org/jira/browse/BEAM-8303 in the release.
>>> I've
>>> > already merged the fix to the release-2.16.0 branch. If we do
>>> another
>>> > RC, we could include it. As a user is blocked on this, I would not
>>> vote
>>> > +1 for this RC, but I also do not want to block the release
>>> process.
>>> >
>>> >
>>> > Max, thank you for the clear communication for the importance and at
>>> the
>>> > same time non-blocking status of the issue.
>>> >
>>> > For the next time, may I suggest asking release manager to do the
>>> > merging to the release branch. We do not know whether there will be an
>>> > RC2 or not. And if there will not be an RC2 release branch as of now
>>> > does not directly correspond to what will be released.
>>> >
>>> >
>>> > On 01.10.19 09:18, Mark Liu wrote:
>>> >  > Hi everyone,
>>> >  >
>>> >  > Please review and vote on the release candidate #1 for the
>>> version
>>> >  > 2.16.0, as follows:
>>> >  > [ ] +1, Approve the release
>>> >  > [ ] -1, Do not approve the release (please provide specific
>>> comments)
>>> >  >
>>> >  >
>>> >  > The complete staging area is available for your review, which
>>> > includes:
>>> >  > * JIRA release notes [1],
>>> >  > * the official Apache source release to be deployed to
>>> > dist.apache.org 
>>> >  >  [2], which is signed with the key with
>>> >  > fingerprint C110B1C82074883A4241D977599D6305FF3ABB32 [3],
>>> >  > * all artifacts to be deployed to the Maven Central Repository
>>> [4],
>>> >  > * source code tag ""v2.16.0-RC1" [5],
>>> >  > * website pull request listing the release [6], publishing the
>>> API
>>> >  > reference manual [7], and the blog post [8].
>>> >  > * Python artifacts are deployed along with the source release
>>> to the
>>> >  > dist.apache.org  <
>>> http://dist.apache.org>
>>> > [2].
>>> >  > * Validation sheet with a tab for 2.16.0 release to help with
>>> > validation
>>> >  > [9].
>>> >  > * Docker images published to Docker Hub [10].
>>> >  >
>>> >  > The vote will be open for at least 72 hours. It is adopted by
>>> > majority
>>> >  > approval, with at least 3 PMC affirmative votes.
>>> >  >
>>> >  > Thanks,
>>> >  > Mark Liu, Release Manager
>>> >  >
>>> >  > [1]
>>> >  >
>>> >
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12345494
>>> >  > [2] https://dist.apache.org/repos/dist/dev/beam/2.16.0/
>>> >  > [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> >  > [4]
>>> >
>>> https://repository.apache.org/content/repositories/orgapachebeam-1085/
>>> >  > [5]

Introduction to the mailing list

2019-10-04 Thread Manuela Chamda Tchakoute
Hello.

 My name is Chamda Manuela from the University of Buea, Cameroon. I am new
to open source and comfortable with Python programming language. I will
like to contribute to the outreachy project "Extend the Nextmark
Benchmarking suite in Apache Beam to include python and portable runners".

I will be glad if someone could help me with a step by step guide to get
started.


Thank you.