[ANNOUNCE] Apache Beam 2.38.0 Released

2022-04-20 Thread Daniel Oliveira
The Apache Beam team is pleased to announce the release of version 2.38.0.

Apache Beam is an open source unified programming model to define and
execute data processing pipelines, including ETL, batch and stream
(continuous) processing. See https://beam.apache.org

You can download the release here:

https://beam.apache.org/get-started/downloads/

This release includes bug fixes, features, and improvements detailed on the
Beam blog: https://beam.apache.org/blog/beam-2.38.0/

Thanks to everyone who contributed to this release, and we hope you enjoy
using Beam 2.38.0.

-- Daniel Oliveira, on behalf of The Apache Beam team


Re: Lots of branches

2021-05-07 Thread Daniel Oliveira
Agreed, it would be a good idea to clean it out a bit. I went and deleted
my own unnecessary branches.

For anyone who needs it, here's a page listing all your branches in the
repo: https://github.com/apache/beam/branches/yours

On Fri, May 7, 2021 at 5:36 PM Ahmet Altay  wrote:

> Hello all,
>
> Our git repo has lots of branches for cherry picks, patches, reverts etc.
> I believe these are artifacts of github's easy to use online editor. If you
> no longer need those, could you please clean them?
>
> Have a great weekend!
> Ahmet
>


Re: [Call for items] Beam October 2020 Newsletter

2020-10-26 Thread Daniel Oliveira
Hi Brittany,

I gave the newsletter a look, but I still have two questions about it:

1. Who's the intended audience for this newsletter, devs or users? Because
I imagine there would probably be a difference in content between the two.
Dev news might include infrastructure changes that are irrelevant to users
(like news about our testing, or our repo, etc.), as well as mentioning
changes that might be in the master branch but have not yet been released.
And vice versa, if it's aimed at users it will probably need to be
constrained to features that have been released, so users don't expect
features they can't actually use yet.

2. Along similar lines, are SDK feature announcements appropriate for this
newsletter? (For example, announcing that SplittableDoFn is available for
implementing sources in the Go SDK.) If the newsletter is aimed at users, I
imagine that new features are probably more appropriate in the Beam release
changelog (with maybe some highlights called out here). On the other hand,
if it's dev focused, then this seems like a good platform to share progress
that we as a community have made on various different projects, I'm just
not sure what section would be most appropriate for it.

Aside from those two questions, this newsletter looks really cool! I
especially like the social media and online engagement sections. For
someone like me who rarely pays attention to social media, it's nice to see
a summary like that.

Thanks,
Daniel Oliveira

On Mon, Oct 26, 2020 at 11:06 AM Brittany Hermann 
wrote:

> Hi everyone,
>
> I am thinking of creating a monthly Beam Community newsletter like this:
> https://docs.google.com/document/d/1_t6xKoOQVwgn2edmRVh1ViudmbnNM3BwZyguKAwwjfA/edit?usp=sharing
>
> My intention would be to send this out once a month. Could you please add
> in any updates that you would like to share with the community by 10/28 at
> 11:59 pm PST? I am planning to edit and send the final version through the
> mailing list on 10/30.
>
> Thank you!
> -Brittany Hermann
>
>
>


[ANNOUNCE] Beam 2.24.0 Released

2020-09-18 Thread Daniel Oliveira
The Apache Beam team is pleased to announce the release of version 2.24.0.

Apache Beam is an open source unified programming model to define and
execute data processing pipelines, including ETL, batch and stream
(continuous) processing. See: https://beam.apache.org

You can download the release here:

https://beam.apache.org/get-started/downloads/

This release includes bug fixes, features, and improvements detailed on
the Beam blog: https://beam.apache.org/blog/beam-2.24.0/

Thanks to everyone who contributed to this release, and we hope you enjoy
using Beam 2.24.0.

-- Daniel Oliveira, on behalf of The Apache Beam team


[RELEASE VOTE RESULT] Release 2.24.0, candidate #3

2020-09-15 Thread Daniel Oliveira
I'm happy to announce that we have approved the 2.24.0 release.

There are 5 approving votes, 3 of which are binding:
* Ahmet Altay
* Robert Bradshaw
* Thomas Weise

Thanks everyone for your help to prepare the release.

I'm going to finalize the release and send out the
official release announcement once it is available.


Re: [PROPOSAL] Preparing for Beam 2.24.0 release

2020-09-14 Thread Daniel Oliveira
I just sent out an update on the RC3 vote thread, and in it I mentioned
some issues I had with the Python RC validation Jenkins target
<https://ci-beam.apache.org/job/beam_PostRelease_Python_Candidate/>. It
seemed appropriate to shunt that rant list of grievances over here instead
of cluttering up the vote thread, since I've been using this thread more
for general progress updates.

Frankly, that Jenkins target is really unwieldy for RC validation. It took
me like 2 days to finally get a useful signal from it for a variety of
reasons, the big one being that the target takes over 4 hours to run from
end to end, and the quickstarts and mobile gaming examples are interspersed
with each other, despite the validation sheet separating them. So if I get
a failure and I try a fix, it takes an extremely long time to actually get
a signal on it. Plus, the whole thing runs sequentially, so if one test
fails, none of the subsequent tests run at all, which makes debugging even
more of a pain.

Finally, the mobile examples are all used with a persistent BQ table on
apache-beam-testing instead of creating and deleting temporary tables. This
means that the examples are used with already populated tables, which
messes with test results (we can't tell if the example failed to write to
the table since it's already populated), and it means that
apache-beam-testing has constantly growing BQ tables hogging up space.

I worked around those issues for this release, but I think this target is a
good candidate to be refactored for future release managers (along with a
bunch of other little pain points I've been writing down). It shouldn't be
too hard to split it into two Jenkins targets, one for the quickstarts and
one for the mobile game examples, to adjust it so that processing continues
even if there's one failure, and to adjust it so that it generates
temporary BQ tables and deletes them once it's done.

I think I can figure those changes out myself, but if anyone has expertise
working with these tests and can find the time to help out (or make
the changes yourself), I would really appreciate it. While I can manage it,
it would probably get done faster and with better quality by someone who's
already familiar with these tests.

On Mon, Aug 31, 2020 at 10:25 PM Daniel Oliveira 
wrote:

> The first release candidate is out. It took longer than I expected thanks
> to running into a few bits of user-unfriendliness, but mostly bugs due to
> me accidentally using JDK 11 with Gradle instead of JDK 8. I'm writing that
> down (along with all the other little difficulties I ran into) to make
> improvements to the release guide for the next release manager.
>
> Anyway, I sent out a thread for voting on the RC, so any issues with it
> can be mentioned in that thread. I just wanted to use this one as a
> progress report on my release experience, in case anyone is wondering what
> caused the release candidate to take so long.
>
> On Wed, Aug 12, 2020 at 6:09 PM Daniel Oliveira 
> wrote:
>
>> Release branch has been cut
>> <https://github.com/apache/beam/tree/release-2.24.0>. Reminder: Don't
>> commit to the release branch directly. If you have a fix for a
>> release-blocking issue, contact me to include it as a cherry pick.
>>
>> Next steps in the process are for me to verify the release branch and to
>> triage release-blocking bugs
>> <https://issues.apache.org/jira/projects/BEAM/versions/12347146>. I've
>> already begun going through bugs and have been leaving comments, so if you
>> have any release-blocking bugs on Jira, please check them.
>>
>> If you have an existing bug that you believe is release-blocking, check
>> it against the requirements in the Beam release guide
>> <https://beam.apache.org/contribute/release-guide/#6-triage-release-blocking-issues-in-jira>
>>  first,
>> and if you believe a cherry-pick would be accepted then mark the Jira with
>> "2.24.0" as the fix version and send me the PR that will need to be
>> cherry-picked.
>>
>> On Tue, Aug 11, 2020 at 6:59 PM Daniel Oliveira 
>> wrote:
>>
>>> I'd like to send out a last minute reminder to fill out CHANGES.md
>>> <https://github.com/apache/beam/blob/master/CHANGES.md> with any major
>>> changes that are going to be in 2.24.0. If you need a quick review for
>>> that, just add me as a reviewer to your PR (GitHub username is "youngoli").
>>> I'll keep an eye out for those until around 5 PM.
>>>
>>> On another note, I need some help with setup from the release guide
>>> <https://beam.apache.org/contribute/release-guide/#one-time-setup-instructions>
>>> :
>>> 1. I need someone to add me as a maint

Re: [VOTE] Release 2.24.0, release candidate #3

2020-09-14 Thread Daniel Oliveira
Hey everyone,

I finally got a decent enough signal from the Python release validations
running on Jenkins
<https://ci-beam.apache.org/job/beam_PostRelease_Python_Candidate/> to
finish up the validation I signed up for. I have some grievances with that
target, but I'll shunt that off to another thread and keep the vote thread
uncluttered.

Looking at the validation sheet
<https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1432428331>
it seems that we've tested the majority of targets, with the exception of
some Flink/Spark tests and Nexmark tests. We also already have 3
binding +1s on this thread. So *I'm planning on closing the vote tomorrow*,
leaving at least 24 hours from this email for any last minute testing,
objections, or finishing discussions. If anyone has any concerns and wants
me to hold off on the release for a bit longer, this is the time to mention
them.

Thanks,
Daniel Oliveira

On Thu, Sep 10, 2020 at 4:59 PM Thomas Weise  wrote:

> +1 (binding)
>
> Rebased fork and run internal performance tests.
>
> While doing so, I run into the unit test issue below with the fn_runner
> (Python direct runner), which did not occur with 2.21 [1]. That processing
> time timers are not supported wasn't an issue previously, because the
> timer, though declared, wasn't exercised in the unit test.
>
> Is there a plan/JIRA to support processing time timers with the direct
> runner?
>
> Thanks,
> Thomas
>
>
> [1]
> https://gist.github.com/tweise/6f8ca6341711f579b0ed9943b8f25138#file-synthetic_stateful-py-L250
>
> /code/venvs/venv/lib/python3.6/site-packages/apache_beam/pipeline.py:555: in 
> __exit__
> self.result = self.run()
> /code/venvs/venv/lib/python3.6/site-packages/apache_beam/pipeline.py:521: in 
> run
> allow_proto_holders=True).run(False)
> /code/venvs/venv/lib/python3.6/site-packages/apache_beam/pipeline.py:534: in 
> run
> return self.runner.run_pipeline(self, self._options)
> /code/venvs/venv/lib/python3.6/site-packages/apache_beam/runners/direct/direct_runner.py:119:
>  in run_pipeline
> return runner.run_pipeline(pipeline, options)
> /code/venvs/venv/lib/python3.6/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py:176:
>  in run_pipeline
> pipeline.to_runner_api(default_environment=self._default_environment))
> /code/venvs/venv/lib/python3.6/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py:182:
>  in run_via_runner_api
> self._check_requirements(pipeline_proto)
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _
>
> self =  object at 0x7fd2a896b400>
> pipeline_proto = components {
>   transforms {
> key: "ref_AppliedPTransform_AppliedPTransform_1"
> value {
>   subtransforms: "ref...}
> }
> root_transform_ids: "ref_AppliedPTransform_AppliedPTransform_1"
> requirements: "beam:requirement:pardo:stateful:v1"
>
>
> def _check_requirements(self, pipeline_proto):
>   """Check that this runner can satisfy all pipeline requirements."""
>   supported_requirements = set(self.supported_requirements())
>   for requirement in pipeline_proto.requirements:
> if requirement not in supported_requirements:
>   raise ValueError(
>   'Unable to run pipeline with requirement: %s' % requirement)
>   for transform in pipeline_proto.components.transforms.values():
> if transform.spec.urn == common_urns.primitives.TEST_STREAM.urn:
>   raise NotImplementedError(transform.spec.urn)
> elif transform.spec.urn in translations.PAR_DO_URNS:
>   payload = proto_utils.parse_Bytes(
>   transform.spec.payload, beam_runner_api_pb2.ParDoPayload)
>   for timer in payload.timer_family_specs.values():
> if timer.time_domain != beam_runner_api_pb2.TimeDomain.EVENT_TIME:
> > raise NotImplementedError(timer.time_domain)
> E NotImplementedError: 2
>
> /code/venvs/venv/lib/python3.6/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py:283:
>  NotImplementedError
>
>
>
> On Thu, Sep 10, 2020 at 4:41 PM Robert Bradshaw 
> wrote:
>
>> Given the additional information, I am upgrading my vote to +1 (binding)
>> based on my prior analysis.
>>
>> On Thu, Sep 10, 2020 at 4:14 PM Kyle Weaver  wrote:
>>
>>> +1 (non-binding)
>>>
>>> Validated wordcount with Python 3.7.8 and Flink 1.10.0 (both loopback
>>> and using the Docker image). Also Python 3.7.8 loopback with an embedded
>>> Spark cluster.
>>>
>>> On Thu,

Re: [VOTE] Release 2.24.0, release candidate #3

2020-09-10 Thread Daniel Oliveira
By the way, most of the validation so far has covered Direct runner and
Dataflow, but Flink and Spark still have little validation, so if anyone
can help with those it will help speed up the release.

On Thu, Sep 10, 2020 at 2:12 PM Daniel Oliveira 
wrote:

> So I tracked the --temp_location issue down to
> https://github.com/apache/beam/pull/12203 and asked @Pablo Estrada
>  and @Chamikara Jayalath  about
> it. It's not exactly a bug, but an intended change in requirements for
> WriteToBigQuery, so the only fix I'll need to do is update the test script
> with the appropriate flag, which should be easy. It also won't require
> building a new release candidate.
>
> There is a possibility that user pipelines will break if they're using
> BigQuery with the Python Direct Runner, so I'll add a note to the changelog
> about it, but I don't think the change is significant enough to need
> anything beyond that.
>
> On Thu, Sep 10, 2020 at 1:47 PM Chamikara Jayalath 
> wrote:
>
>> +1 (non-binding)
>>
>> Thanks,
>> Cham
>>
>> On Thu, Sep 10, 2020 at 11:26 AM Ahmet Altay  wrote:
>>
>>> +1 - validated py3 quickstarts. The problem I mentioned earlier is
>>> resolved.
>>>
>>> On Wed, Sep 9, 2020 at 7:46 PM Daniel Oliveira 
>>> wrote:
>>>
>>>> Good news: According to
>>>> https://ci-beam.apache.org/job/beam_PostRelease_Python_Candidate/188/consoleFull
>>>>  the
>>>> Streaming Wordcount quickstart work for Dataflow with Python 2.7. So it
>>>> looks like the container issue might be fixed.
>>>>
>>>> Bad news: That same Jenkins job failed on "Running HourlyTeamScore
>>>> example with DirectRunner" because it's missing a --temp_location flag,
>>>> despite using the DirectRunner. This looks like a bug, but I'm still
>>>> investigating whether it'll need another cherry-pick and RC to fix or if
>>>> the validation script just needs to be updated. I'll update the thread if I
>>>> find anything.
>>>>
>>>
>>> Probably it does not require a cherry-pick. We have not validated that
>>> workflow in the past few releases.
>>>
>>>
>>>>
>>>> On Wed, Sep 9, 2020 at 4:58 PM Daniel Oliveira 
>>>> wrote:
>>>>
>>>>> The Dataflow Python Batch worker issue should be fixed now. I tried
>>>>> verifying it myself via the rc validation script, but I've been having 
>>>>> some
>>>>> trouble with the GCP authentication so if someone else can validate it,
>>>>> that would be a big help.
>>>>>
>>>>> On Tue, Sep 8, 2020 at 5:51 PM Robert Bradshaw 
>>>>> wrote:
>>>>>
>>>>>> I verified the signatures and all the artifacts are correct, and
>>>>>> tested a wheel in a fresh virtual environment. It'd be good to see the
>>>>>> Dataflow issue confirmed as fixed though.
>>>>>>
>>>>>> On Tue, Sep 8, 2020 at 5:17 PM Valentyn Tymofieiev <
>>>>>> valen...@google.com> wrote:
>>>>>>
>>>>>>> This error comes from the Dataflow Python Batch worker.
>>>>>>>
>>>>>>> Streaming workflows use sdk worker, which is provided by apache-beam
>>>>>>> library, so the versions will match.
>>>>>>>
>>>>>>> The error should be fixed by setting the correct Dataflow worker
>>>>>>> version in Dataflow containers, and does not affect Beam RC.
>>>>>>>
>>>>>>> On Tue, Sep 8, 2020 at 4:52 PM Ahmet Altay  wrote:
>>>>>>>
>>>>>>>> -1 - I validated py3 quickstarts on dataflow and direct runner. I
>>>>>>>> ran into 1 issue with batch workflows on dataflow:
>>>>>>>>
>>>>>>>> "RuntimeError: Beam SDK base version 2.24.0 does not match Dataflow
>>>>>>>> Python worker version 2.24.0.dev. Please check Dataflow worker
>>>>>>>> startup logs and make sure that correct version of Beam SDK is 
>>>>>>>> installed."
>>>>>>>>
>>>>>>>> It seems like the batch worker needs to be rebuild. Not sure why
>>>>>>>> the streaming worker did not fail (does it have the correct version? or
>>>>>>>> does it not have the same check

Re: [VOTE] Release 2.24.0, release candidate #3

2020-09-10 Thread Daniel Oliveira
So I tracked the --temp_location issue down to
https://github.com/apache/beam/pull/12203 and asked @Pablo Estrada
 and @Chamikara Jayalath  about
it. It's not exactly a bug, but an intended change in requirements for
WriteToBigQuery, so the only fix I'll need to do is update the test script
with the appropriate flag, which should be easy. It also won't require
building a new release candidate.

There is a possibility that user pipelines will break if they're using
BigQuery with the Python Direct Runner, so I'll add a note to the changelog
about it, but I don't think the change is significant enough to need
anything beyond that.

On Thu, Sep 10, 2020 at 1:47 PM Chamikara Jayalath 
wrote:

> +1 (non-binding)
>
> Thanks,
> Cham
>
> On Thu, Sep 10, 2020 at 11:26 AM Ahmet Altay  wrote:
>
>> +1 - validated py3 quickstarts. The problem I mentioned earlier is
>> resolved.
>>
>> On Wed, Sep 9, 2020 at 7:46 PM Daniel Oliveira 
>> wrote:
>>
>>> Good news: According to
>>> https://ci-beam.apache.org/job/beam_PostRelease_Python_Candidate/188/consoleFull
>>>  the
>>> Streaming Wordcount quickstart work for Dataflow with Python 2.7. So it
>>> looks like the container issue might be fixed.
>>>
>>> Bad news: That same Jenkins job failed on "Running HourlyTeamScore
>>> example with DirectRunner" because it's missing a --temp_location flag,
>>> despite using the DirectRunner. This looks like a bug, but I'm still
>>> investigating whether it'll need another cherry-pick and RC to fix or if
>>> the validation script just needs to be updated. I'll update the thread if I
>>> find anything.
>>>
>>
>> Probably it does not require a cherry-pick. We have not validated that
>> workflow in the past few releases.
>>
>>
>>>
>>> On Wed, Sep 9, 2020 at 4:58 PM Daniel Oliveira 
>>> wrote:
>>>
>>>> The Dataflow Python Batch worker issue should be fixed now. I tried
>>>> verifying it myself via the rc validation script, but I've been having some
>>>> trouble with the GCP authentication so if someone else can validate it,
>>>> that would be a big help.
>>>>
>>>> On Tue, Sep 8, 2020 at 5:51 PM Robert Bradshaw 
>>>> wrote:
>>>>
>>>>> I verified the signatures and all the artifacts are correct, and
>>>>> tested a wheel in a fresh virtual environment. It'd be good to see the
>>>>> Dataflow issue confirmed as fixed though.
>>>>>
>>>>> On Tue, Sep 8, 2020 at 5:17 PM Valentyn Tymofieiev <
>>>>> valen...@google.com> wrote:
>>>>>
>>>>>> This error comes from the Dataflow Python Batch worker.
>>>>>>
>>>>>> Streaming workflows use sdk worker, which is provided by apache-beam
>>>>>> library, so the versions will match.
>>>>>>
>>>>>> The error should be fixed by setting the correct Dataflow worker
>>>>>> version in Dataflow containers, and does not affect Beam RC.
>>>>>>
>>>>>> On Tue, Sep 8, 2020 at 4:52 PM Ahmet Altay  wrote:
>>>>>>
>>>>>>> -1 - I validated py3 quickstarts on dataflow and direct runner. I
>>>>>>> ran into 1 issue with batch workflows on dataflow:
>>>>>>>
>>>>>>> "RuntimeError: Beam SDK base version 2.24.0 does not match Dataflow
>>>>>>> Python worker version 2.24.0.dev. Please check Dataflow worker
>>>>>>> startup logs and make sure that correct version of Beam SDK is 
>>>>>>> installed."
>>>>>>>
>>>>>>> It seems like the batch worker needs to be rebuild. Not sure why the
>>>>>>> streaming worker did not fail (does it have the correct version? or 
>>>>>>> does it
>>>>>>> not have the same check?)
>>>>>>>
>>>>>>> Ahmet
>>>>>>>
>>>>>>> On Fri, Sep 4, 2020 at 1:33 PM Valentyn Tymofieiev <
>>>>>>> valen...@google.com> wrote:
>>>>>>>
>>>>>>>> Dataflow containers are also available now.
>>>>>>>>
>>>>>>>> On Thu, Sep 3, 2020 at 11:47 PM Daniel Oliveira <
>>>>>>>> danolive...@google.com> wrote:
>>>>>>>>
>>>>>>>>> This shou

[VOTE] Release 2.24.0, release candidate #3

2020-09-03 Thread Daniel Oliveira
Hi everyone,
Please review and vote on the release candidate #3 for the version 2.24.0,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2],
which is signed with the key with fingerprint
D0E7B69D911ADA3C0482BAA1C4E6B2F8C71D742F [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "v2.24.0-RC3" [5],
* website pull request listing the release [6], publishing the API
reference manual [7], and the blog post [8].
* Java artifacts were built with Maven 3.6.3 and OpenJDK 1.8.0.
* Python artifacts are deployed along with the source release to the
dist.apache.org [2].
* Validation sheet with a tab for 2.24.0 release to help with validation
[9].
* Docker images published to Docker Hub [10].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Release Manager

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12347146
[2] https://dist.apache.org/repos/dist/dev/beam/2.24.0/
[3] https://dist.apache.org/repos/dist/release/beam/KEYS
[4] https://repository.apache.org/content/repositories/orgapachebeam-1110/
[5] https://github.com/apache/beam/tree/v2.24.0-RC3
[6] https://github.com/apache/beam/pull/12743
[7] https://github.com/apache/beam-site/pull/607
[8] https://github.com/apache/beam/pull/12745
[9]
https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1432428331
[10] https://hub.docker.com/search?q=apache%2Fbeam&type=image


Re: [VOTE] Release 2.24.0, release candidate #2

2020-09-03 Thread Daniel Oliveira
Alright, it seems like this is a regression so it'll definitely need to be
fixed. I'm going to be out of office starting tomorrow until next Tuesday,
so I'll try to get a new release candidate out tonight so it's available
while I'm out.

Anyway, this vote is closed.

As usual, I recommend to keep testing RC2 until RC3 is out.

On Thu, Sep 3, 2020 at 12:51 PM Pablo Estrada  wrote:

> -1
> I've confirmed the issue reproduces in 2.24. The fix includes a test case.
> (https://github.com/apache/beam/pull/12761). I will send a cherry-pick
> for this.
> Best
> -P.
>
> On Thu, Sep 3, 2020 at 12:14 PM Pablo Estrada  wrote:
>
>> I just discovered there may be an issue in the BigQuery connector for
>> Python, where very large imports to BQ may not be working properly due to
>> copy job ids being duplicated (PR to fix on master is here:
>> https://github.com/apache/beam/pull/12761)
>> I will try to reproduce the error on this RC, and if I can repro it, then
>> I'll vote -1, as the WriteToBigQuery transform is used by many users.
>>
>> On Thu, Sep 3, 2020 at 2:12 AM Ismaël Mejía  wrote:
>>
>>> I just want to confirm that the issue I reported in RC1 is now fixed.
>>> Thanks Daniel!
>>>
>>> On Thu, Sep 3, 2020 at 6:44 AM Daniel Oliveira 
>>> wrote:
>>> >
>>> > This RC was built with the expected version of OpenJDK, so it should
>>> fix the issue from the previous RC.
>>> >
>>> > Unfortunately Dataflow containers are not yet ready, so there will be
>>> some delay before Dataflow can be tested. I'm working on getting that done
>>> ASAP and will update this thread once they're ready.
>>> >
>>> > On Wed, Sep 2, 2020 at 9:40 PM Daniel Oliveira 
>>> wrote:
>>> >>
>>> >> Hi everyone,
>>> >> Please review and vote on the release candidate #2 for the version
>>> 2.24.0, as follows:
>>> >> [ ] +1, Approve the release
>>> >> [ ] -1, Do not approve the release (please provide specific comments)
>>> >>
>>> >>
>>> >> The complete staging area is available for your review, which
>>> includes:
>>> >> * JIRA release notes [1],
>>> >> * the official Apache source release to be deployed to
>>> dist.apache.org [2], which is signed with the key with fingerprint
>>> D0E7B69D911ADA3C0482BAA1C4E6B2F8C71D742F [3],
>>> >> * all artifacts to be deployed to the Maven Central Repository [4],
>>> >> * source code tag "v2.24.0-RC2" [5],
>>> >> * website pull request listing the release [6], publishing the API
>>> reference manual [7], and the blog post [8].
>>> >> * Java artifacts were built with Maven 3.6.3 and OpenJDK 1.8.0.
>>> >> * Python artifacts are deployed along with the source release to the
>>> dist.apache.org [2].
>>> >> * Validation sheet with a tab for 2.24.0 release to help with
>>> validation [9].
>>> >> * Docker images published to Docker Hub [10].
>>> >>
>>> >> The vote will be open for at least 72 hours. It is adopted by
>>> majority approval, with at least 3 PMC affirmative votes.
>>> >>
>>> >> Thanks,
>>> >> Release Manager
>>> >>
>>> >> [1]
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12347146
>>> >> [2] https://dist.apache.org/repos/dist/dev/beam/2.24.0/
>>> >> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> >> [4]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1109/
>>> >> [5] https://github.com/apache/beam/tree/v2.24.0-RC2
>>> >> [6] https://github.com/apache/beam/pull/12743
>>> >> [7] https://github.com/apache/beam-site/pull/607
>>> >> [8] https://github.com/apache/beam/pull/12745
>>> >> [9]
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1432428331
>>> >> [10] https://hub.docker.com/search?q=apache%2Fbeam&type=image
>>> >>
>>>
>>


Re: [VOTE] Release 2.24.0, release candidate #2

2020-09-02 Thread Daniel Oliveira
This RC was built with the expected version of OpenJDK, so it should fix
the issue from the previous RC.

Unfortunately Dataflow containers are not yet ready, so there will be some
delay before Dataflow can be tested. I'm working on getting that done ASAP
and will update this thread once they're ready.

On Wed, Sep 2, 2020 at 9:40 PM Daniel Oliveira 
wrote:

> Hi everyone,
> Please review and vote on the release candidate #2 for the version 2.24.0,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org [2],
> which is signed with the key with fingerprint
> D0E7B69D911ADA3C0482BAA1C4E6B2F8C71D742F [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.24.0-RC2" [5],
> * website pull request listing the release [6], publishing the API
> reference manual [7], and the blog post [8].
> * Java artifacts were built with Maven 3.6.3 and OpenJDK 1.8.0.
> * Python artifacts are deployed along with the source release to the
> dist.apache.org [2].
> * Validation sheet with a tab for 2.24.0 release to help with validation
> [9].
> * Docker images published to Docker Hub [10].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Release Manager
>
> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12347146
> [2] https://dist.apache.org/repos/dist/dev/beam/2.24.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1109/
> [5] https://github.com/apache/beam/tree/v2.24.0-RC2
> [6] https://github.com/apache/beam/pull/12743
> [7] https://github.com/apache/beam-site/pull/607
> [8] https://github.com/apache/beam/pull/12745
> [9]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1432428331
> [10] https://hub.docker.com/search?q=apache%2Fbeam&type=image
>
>


Re: [VOTE] Release 2.24.0, release candidate #1

2020-09-01 Thread Daniel Oliveira
ssElement(GroupAlsoByWin
> [0m [91mdo [0m [91mwEvaluatorFactory.java:185)
> 2020-09-01T08:19:46.7759005Z at
> org.apache.beam.runners.direct.Direct [0m
> [91mTransformExecutor.processElements(DirectTransformExecutor.ja [0m
> [91mv [0m [91ma:160)
> 2020-09-01T08:19:46.7759577Z at
> org.apache.beam.runners.direct.DirectT [0m [91mransformExecutor.ru [0m
> [91mn(DirectTransformExecutor.java:124)
> 2020-09-01T08:19:46.7760252Z at
> java.util.concurrent.Executors$Ru [0m
> [91mnnableAdapter.call(Executors.java:511 [0m [91m)
> 2020-09-01T08:19:46.7760843Z at
> java.util.concurrent.Futu [0m [91mreTask.run(FutureTask.java:266)
> 2020-09-01T08:19:46.7761345Z at
> java.util.concurrent.Threa [0m [91mdP [0m
> [91moolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 2020-09-01T08:19:46.7761910Z at
> java.util [0m
> [91m.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
> [0m [91m:624)
> 2020-09-01T08:19:46.7762418Z at java.lang.Thread.run(Threa [0m
> [91md.java:748)
>
> I checked the generated classes and they are v52 (Java 8 compatible) but
> something seems to be wrong. Both the latest 2.24.0-SNAPSHOT version and
> the
> current 2.25.0-SNAPSHOT version pass the tests without issues. The issue
> seems
> to be only with the RC1 artifacts.
>
>
> On Tue, Sep 1, 2020 at 8:17 PM Robert Bradshaw 
> wrote:
> >
> > On Tue, Sep 1, 2020 at 10:41 AM Daniel Oliveira 
> wrote:
> > >
> > > I should probably call out that Dataflow containers aren't built yet
> (I will be building them today), so testing of Dataflow should probably
> wait until tomorrow.
> > >
> > > > If Java 11 was used to build the release artifacts, does this create
> any backwards-compatibility challenges for Java 8 users?
> > >
> > > It's definitely possible, but I only realized this at the end of the
> process, so I elected to just finish the release candidate rather than
> restarting the whole process, so we can find out in testing.
> >
> > I am concerned testing alone may not cover the issues in corner cases
> > that this may cause (and users would hit). I'd prefer we just re-build
> > it with Java 8 to be safe. (That doesn't preclude validating the
> > artifacts in other ways in the meantime.)
> >
> > > On Tue, Sep 1, 2020 at 9:14 AM Valentyn Tymofieiev <
> valen...@google.com> wrote:
> > >>
> > >> > * Java artifacts were built with Maven 3.6.3 and OpenJDK 11.0.7.
> > >>
> > >> If Java 11 was used to build the release artifacts, does this create
> any backwards-compatibility challenges for Java 8 users?
> > >>
> > >> On Mon, Aug 31, 2020 at 8:59 PM Daniel Oliveira <
> danolive...@google.com> wrote:
> > >>>
> > >>> Hi everyone,
> > >>> Please review and vote on the release candidate #1 for the version
> 2.24.0, as follows:
> > >>> [ ] +1, Approve the release
> > >>> [ ] -1, Do not approve the release (please provide specific comments)
> > >>>
> > >>>
> > >>> The complete staging area is available for your review, which
> includes:
> > >>> * JIRA release notes [1],
> > >>> * the official Apache source release to be deployed to
> dist.apache.org [2], which is signed with the key with fingerprint
> D0E7B69D911ADA3C0482BAA1C4E6B2F8C71D742F [3],
> > >>> * all artifacts to be deployed to the Maven Central Repository [4],
> > >>> * source code tag "v2.24.0-RC1" [5],
> > >>> * website pull request listing the release [6], publishing the API
> reference manual [7], and the blog post [8].
> > >>> * Java artifacts were built with Maven 3.6.3 and OpenJDK 11.0.7.
> > >>> * Python artifacts are deployed along with the source release to the
> dist.apache.org [2].
> > >>> * Validation sheet with a tab for 2.24.0 release to help with
> validation [9].
> > >>> * Docker images published to Docker Hub [10].
> > >>>
> > >>> The vote will be open for at least 72 hours. It is adopted by
> majority approval, with at least 3 PMC affirmative votes.
> > >>>
> > >>> Thanks,
> > >>> Release Manager
> > >>>
> > >>> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12347146
> > >>> [2] https://dist.apache.org/repos/dist/dev/beam/2.24.0/
> > >>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> > >>> [4]
> https://repository.apache.org/content/repositories/orgapachebeam-1108/
> > >>> [5] https://github.com/apache/beam/tree/v2.24.0-RC1
> > >>> [6] https://github.com/apache/beam/pull/12743
> > >>> [7] https://github.com/apache/beam-site/pull/607
> > >>> [8] https://github.com/apache/beam/pull/12745
> > >>> [9]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1432428331
> > >>> [10] https://hub.docker.com/search?q=apache%2Fbeam&type=image
>


Re: [VOTE] Release 2.24.0, release candidate #1

2020-09-01 Thread Daniel Oliveira
I should probably call out that Dataflow containers aren't built yet (I
will be building them today), so testing of Dataflow should probably wait
until tomorrow.

> If Java 11 was used to build the release artifacts, does this create any
backwards-compatibility challenges for Java 8 users?

It's definitely possible, but I only realized this at the end of the
process, so I elected to just finish the release candidate rather than
restarting the whole process, so we can find out in testing.

On Tue, Sep 1, 2020 at 9:14 AM Valentyn Tymofieiev 
wrote:

> > * Java artifacts were built with Maven 3.6.3 and OpenJDK 11.0.7.
>
> If Java 11 was used to build the release artifacts, does this create any
> backwards-compatibility challenges for Java 8 users?
>
> On Mon, Aug 31, 2020 at 8:59 PM Daniel Oliveira 
> wrote:
>
>> Hi everyone,
>> Please review and vote on the release candidate #1 for the version
>> 2.24.0, as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>>
>> The complete staging area is available for your review, which includes:
>> * JIRA release notes [1],
>> * the official Apache source release to be deployed to dist.apache.org
>> [2], which is signed with the key with fingerprint
>> D0E7B69D911ADA3C0482BAA1C4E6B2F8C71D742F [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "v2.24.0-RC1" [5],
>> * website pull request listing the release [6], publishing the API
>> reference manual [7], and the blog post [8].
>> * Java artifacts were built with Maven 3.6.3 and OpenJDK 11.0.7.
>> * Python artifacts are deployed along with the source release to the
>> dist.apache.org [2].
>> * Validation sheet with a tab for 2.24.0 release to help with validation
>> [9].
>> * Docker images published to Docker Hub [10].
>>
>> The vote will be open for at least 72 hours. It is adopted by majority
>> approval, with at least 3 PMC affirmative votes.
>>
>> Thanks,
>> Release Manager
>>
>> [1]
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12347146
>> [2] https://dist.apache.org/repos/dist/dev/beam/2.24.0/
>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> [4]
>> https://repository.apache.org/content/repositories/orgapachebeam-1108/
>> [5] https://github.com/apache/beam/tree/v2.24.0-RC1
>> [6] https://github.com/apache/beam/pull/12743
>> [7] https://github.com/apache/beam-site/pull/607
>> [8] https://github.com/apache/beam/pull/12745
>> [9]
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1432428331
>> [10] https://hub.docker.com/search?q=apache%2Fbeam&type=image
>>
>


[VOTE] Release 2.24.0, release candidate #1

2020-08-31 Thread Daniel Oliveira
Hi everyone,
Please review and vote on the release candidate #1 for the version 2.24.0,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2],
which is signed with the key with fingerprint
D0E7B69D911ADA3C0482BAA1C4E6B2F8C71D742F [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "v2.24.0-RC1" [5],
* website pull request listing the release [6], publishing the API
reference manual [7], and the blog post [8].
* Java artifacts were built with Maven 3.6.3 and OpenJDK 11.0.7.
* Python artifacts are deployed along with the source release to the
dist.apache.org [2].
* Validation sheet with a tab for 2.24.0 release to help with validation
[9].
* Docker images published to Docker Hub [10].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Release Manager

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12347146
[2] https://dist.apache.org/repos/dist/dev/beam/2.24.0/
[3] https://dist.apache.org/repos/dist/release/beam/KEYS
[4] https://repository.apache.org/content/repositories/orgapachebeam-1108/
[5] https://github.com/apache/beam/tree/v2.24.0-RC1
[6] https://github.com/apache/beam/pull/12743
[7] https://github.com/apache/beam-site/pull/607
[8] https://github.com/apache/beam/pull/12745
[9]
https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1432428331
[10] https://hub.docker.com/search?q=apache%2Fbeam&type=image


Re: [PROPOSAL] Preparing for Beam 2.24.0 release

2020-08-27 Thread Daniel Oliveira
Hey Eugene,

That Jira is a bit misleading, it's still tracking a root cause, but a
workaround was submitted so it's no longer blocking the release. I'll
remove the release tag from it to avoid that confusion.

I've been trying to get a release candidate out since last Thursday, but
between several bugs I ran into and other time-sensitive work that delayed
me, it's taken a while. I've been getting help from some previous release
managers thankfully, or it probably would've taken even longer. Anyway,
following the release guide
<https://beam.apache.org/contribute/release-guide/#7-build-a-release-candidate>,
I just finished step 7 last night after working around the last bug that
was blocking me, and I'm continuing from there today, so hopefully I'll be
able to have the release candidate ready before the week is up.

Hope the update is helpful,
Daniel Oliveira

On Thu, Aug 27, 2020 at 12:56 PM Eugene Kirpichov 
wrote:

> Hi!
>
> Just wondering how the progress on 2.24 has been?
> I see the version in JIRA
> https://issues.apache.org/jira/projects/BEAM/versions/12347146 is blocked
> only by https://issues.apache.org/jira/browse/BEAM-10663 which hasn't
> seen much action in the last week. Is there something specific people can
> help with?
>
> Thanks!
>
> On 2020/08/12 01:59:20, Daniel Oliveira  wrote:
> > I'd like to send out a last minute reminder to fill out CHANGES.md>
> > <https://github.com/apache/beam/blob/master/CHANGES.md> with any major>
> > changes that are going to be in 2.24.0. If you need a quick review for>
> > that, just add me as a reviewer to your PR (GitHub username is
> "youngoli").>
> > I'll keep an eye out for those until around 5 PM.>
> >
> > On another note, I need some help with setup from the release guide>
> > <
> https://beam.apache.org/contribute/release-guide/#one-time-setup-instructions>>
>
> > :>
> > 1. I need someone to add me as a maintainer of the apache-beam package
> on>
> > PyPI. Username: danoliveira>
> > 2. Someone might need to create a new version in JIRA>
> > <
> https://beam.apache.org/contribute/release-guide/#create-a-new-version-in-jira>.>
>
> > I'm not sure about this one because 2.25.0 already exists, I don't know
> if>
> > 2.26.0 needs to be created or if that's for the next release.>
> >
> > On Mon, Aug 10, 2020 at 8:27 PM Daniel Oliveira >
> > wrote:>
> >
> > > Hi everyone,>
> > >>
> > > It seems like there's no objections, so I'm preparing to cut the
> release>
> > > on Wednesday.>
> > >>
> > > As a reminder, if you have any release-blocking issues, please have a
> JIRA>
> > > and set "Fix version" to 2.24.0. For non-blocking issues, please set
> "Fix>
> > > version" only once the issue is actually resolved, otherwise it makes
> it>
> > > more difficult to differentiate release-blocking issues from
> non-blocking.>
> > >>
> > > Thanks,>
> > > Daniel Oliveira>
> > >>
> > > On Thu, Aug 6, 2020 at 4:53 PM Rui Wang  wrote:>
> > >>
> > >> Awesome!>
> > >>>
> > >>>
> > >> -Rui>
> > >>>
> > >> On Thu, Aug 6, 2020 at 4:14 PM Ahmet Altay 
> wrote:>
> > >>>
> > >>> +1 - Thank you Daniel!!>
> > >>>>
> > >>> On Wed, Jul 29, 2020 at 4:30 PM Daniel Oliveira >
> > >>> wrote:>
> > >>>>
> > >>>> > You probably meant 2.24.0.>
> > >>>>>
> > >>>> Thanks, yes I did. Mark "Fix Version/s" as "2.24.0" everyone. :)>
> > >>>>>
> > >>>> On Wed, Jul 29, 2020 at 4:14 PM Valentyn Tymofieiev <>
> > >>>> valen...@google.com> wrote:>
> > >>>>>
> > >>>>> +1, Thanks Daniel!>
> > >>>>>>
> > >>>>> On Wed, Jul 29, 2020 at 4:04 PM Daniel Oliveira <>
> > >>>>> danolive...@google.com> wrote:>
> > >>>>>>
> > >>>>>> Hi everyone,>
> > >>>>>>>
> > >>>>>> The next Beam release branch (2.24.0) is scheduled to be cut on>
> > >>>>>> August 12 according to the release calendar [1].>
> > >>>>>>>
> > >>>>>> I'd like to volunteer to handle this release. Following the lead
> of>
> > >>>>>> previous release managers, I plan on cutting the branch on that
> date and>
> > >>>>>> cherrypicking in release-blocking fixes afterwards. So unresolved
> release>
> > >>>>>> blocking JIRA issues should have their "Fix Version/s" marked as
> "2.23.0".>
> > >>>>>>>
> > >>>>> You probably meant 2.24.0 [1].>
> > >>>>>>
> > >>>>>>
> > >>>>>> Any comments or objections?>
> > >>>>>>>
> > >>>>>> Thanks,>
> > >>>>>> Daniel Oliveira>
> > >>>>>>>
> > >>>>>> [1]>
> > >>>>>>
> https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com>
>
> > >>>>>>>
> > >>>>> [1] https://issues.apache.org/jira/projects/BEAM/versions/12347146>
>
> > >>>>>>
> > >>>>>
> >
>


Re: [PROPOSAL] Preparing for Beam 2.24.0 release

2020-08-12 Thread Daniel Oliveira
Release branch has been cut
<https://github.com/apache/beam/tree/release-2.24.0>. Reminder: Don't
commit to the release branch directly. If you have a fix for a
release-blocking issue, contact me to include it as a cherry pick.

Next steps in the process are for me to verify the release branch and to
triage release-blocking bugs
<https://issues.apache.org/jira/projects/BEAM/versions/12347146>. I've
already begun going through bugs and have been leaving comments, so if you
have any release-blocking bugs on Jira, please check them.

If you have an existing bug that you believe is release-blocking, check it
against the requirements in the Beam release guide
<https://beam.apache.org/contribute/release-guide/#6-triage-release-blocking-issues-in-jira>
first,
and if you believe a cherry-pick would be accepted then mark the Jira with
"2.24.0" as the fix version and send me the PR that will need to be
cherry-picked.

On Tue, Aug 11, 2020 at 6:59 PM Daniel Oliveira 
wrote:

> I'd like to send out a last minute reminder to fill out CHANGES.md
> <https://github.com/apache/beam/blob/master/CHANGES.md> with any major
> changes that are going to be in 2.24.0. If you need a quick review for
> that, just add me as a reviewer to your PR (GitHub username is "youngoli").
> I'll keep an eye out for those until around 5 PM.
>
> On another note, I need some help with setup from the release guide
> <https://beam.apache.org/contribute/release-guide/#one-time-setup-instructions>
> :
> 1. I need someone to add me as a maintainer of the apache-beam package on
> PyPI. Username: danoliveira
> 2. Someone might need to create a new version in JIRA
> <https://beam.apache.org/contribute/release-guide/#create-a-new-version-in-jira>.
> I'm not sure about this one because 2.25.0 already exists, I don't know if
> 2.26.0 needs to be created or if that's for the next release.
>
> On Mon, Aug 10, 2020 at 8:27 PM Daniel Oliveira 
> wrote:
>
>> Hi everyone,
>>
>> It seems like there's no objections, so I'm preparing to cut the release
>> on Wednesday.
>>
>> As a reminder, if you have any release-blocking issues, please have a
>> JIRA and set "Fix version" to 2.24.0. For non-blocking issues, please set
>> "Fix version" only once the issue is actually resolved, otherwise it makes
>> it more difficult to differentiate release-blocking issues from
>> non-blocking.
>>
>> Thanks,
>> Daniel Oliveira
>>
>> On Thu, Aug 6, 2020 at 4:53 PM Rui Wang  wrote:
>>
>>> Awesome!
>>>
>>>
>>> -Rui
>>>
>>> On Thu, Aug 6, 2020 at 4:14 PM Ahmet Altay  wrote:
>>>
>>>> +1 - Thank you Daniel!!
>>>>
>>>> On Wed, Jul 29, 2020 at 4:30 PM Daniel Oliveira 
>>>> wrote:
>>>>
>>>>> > You probably meant 2.24.0.
>>>>>
>>>>> Thanks, yes I did. Mark "Fix Version/s" as "2.24.0" everyone. :)
>>>>>
>>>>> On Wed, Jul 29, 2020 at 4:14 PM Valentyn Tymofieiev <
>>>>> valen...@google.com> wrote:
>>>>>
>>>>>> +1, Thanks Daniel!
>>>>>>
>>>>>> On Wed, Jul 29, 2020 at 4:04 PM Daniel Oliveira <
>>>>>> danolive...@google.com> wrote:
>>>>>>
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> The next Beam release branch (2.24.0) is scheduled to be cut on
>>>>>>> August 12 according to the release calendar [1].
>>>>>>>
>>>>>>> I'd like to volunteer to handle this release. Following the lead of
>>>>>>> previous release managers, I plan on cutting the branch on that date and
>>>>>>> cherrypicking in release-blocking fixes afterwards. So unresolved 
>>>>>>> release
>>>>>>> blocking JIRA issues should have their "Fix Version/s" marked as 
>>>>>>> "2.23.0".
>>>>>>>
>>>>>> You probably meant 2.24.0 [1].
>>>>>>
>>>>>>
>>>>>>> Any comments or objections?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Daniel Oliveira
>>>>>>>
>>>>>>> [1]
>>>>>>> https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com
>>>>>>>
>>>>>> [1] https://issues.apache.org/jira/projects/BEAM/versions/12347146
>>>>>>
>>>>>


Re: [PROPOSAL] Preparing for Beam 2.24.0 release

2020-08-11 Thread Daniel Oliveira
I'd like to send out a last minute reminder to fill out CHANGES.md
<https://github.com/apache/beam/blob/master/CHANGES.md> with any major
changes that are going to be in 2.24.0. If you need a quick review for
that, just add me as a reviewer to your PR (GitHub username is "youngoli").
I'll keep an eye out for those until around 5 PM.

On another note, I need some help with setup from the release guide
<https://beam.apache.org/contribute/release-guide/#one-time-setup-instructions>
:
1. I need someone to add me as a maintainer of the apache-beam package on
PyPI. Username: danoliveira
2. Someone might need to create a new version in JIRA
<https://beam.apache.org/contribute/release-guide/#create-a-new-version-in-jira>.
I'm not sure about this one because 2.25.0 already exists, I don't know if
2.26.0 needs to be created or if that's for the next release.

On Mon, Aug 10, 2020 at 8:27 PM Daniel Oliveira 
wrote:

> Hi everyone,
>
> It seems like there's no objections, so I'm preparing to cut the release
> on Wednesday.
>
> As a reminder, if you have any release-blocking issues, please have a JIRA
> and set "Fix version" to 2.24.0. For non-blocking issues, please set "Fix
> version" only once the issue is actually resolved, otherwise it makes it
> more difficult to differentiate release-blocking issues from non-blocking.
>
> Thanks,
> Daniel Oliveira
>
> On Thu, Aug 6, 2020 at 4:53 PM Rui Wang  wrote:
>
>> Awesome!
>>
>>
>> -Rui
>>
>> On Thu, Aug 6, 2020 at 4:14 PM Ahmet Altay  wrote:
>>
>>> +1 - Thank you Daniel!!
>>>
>>> On Wed, Jul 29, 2020 at 4:30 PM Daniel Oliveira 
>>> wrote:
>>>
>>>> > You probably meant 2.24.0.
>>>>
>>>> Thanks, yes I did. Mark "Fix Version/s" as "2.24.0" everyone. :)
>>>>
>>>> On Wed, Jul 29, 2020 at 4:14 PM Valentyn Tymofieiev <
>>>> valen...@google.com> wrote:
>>>>
>>>>> +1, Thanks Daniel!
>>>>>
>>>>> On Wed, Jul 29, 2020 at 4:04 PM Daniel Oliveira <
>>>>> danolive...@google.com> wrote:
>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> The next Beam release branch (2.24.0) is scheduled to be cut on
>>>>>> August 12 according to the release calendar [1].
>>>>>>
>>>>>> I'd like to volunteer to handle this release. Following the lead of
>>>>>> previous release managers, I plan on cutting the branch on that date and
>>>>>> cherrypicking in release-blocking fixes afterwards. So unresolved release
>>>>>> blocking JIRA issues should have their "Fix Version/s" marked as 
>>>>>> "2.23.0".
>>>>>>
>>>>> You probably meant 2.24.0 [1].
>>>>>
>>>>>
>>>>>> Any comments or objections?
>>>>>>
>>>>>> Thanks,
>>>>>> Daniel Oliveira
>>>>>>
>>>>>> [1]
>>>>>> https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com
>>>>>>
>>>>> [1] https://issues.apache.org/jira/projects/BEAM/versions/12347146
>>>>>
>>>>


Re: [PROPOSAL] Preparing for Beam 2.24.0 release

2020-08-10 Thread Daniel Oliveira
Hi everyone,

It seems like there's no objections, so I'm preparing to cut the release on
Wednesday.

As a reminder, if you have any release-blocking issues, please have a JIRA
and set "Fix version" to 2.24.0. For non-blocking issues, please set "Fix
version" only once the issue is actually resolved, otherwise it makes it
more difficult to differentiate release-blocking issues from non-blocking.

Thanks,
Daniel Oliveira

On Thu, Aug 6, 2020 at 4:53 PM Rui Wang  wrote:

> Awesome!
>
>
> -Rui
>
> On Thu, Aug 6, 2020 at 4:14 PM Ahmet Altay  wrote:
>
>> +1 - Thank you Daniel!!
>>
>> On Wed, Jul 29, 2020 at 4:30 PM Daniel Oliveira 
>> wrote:
>>
>>> > You probably meant 2.24.0.
>>>
>>> Thanks, yes I did. Mark "Fix Version/s" as "2.24.0" everyone. :)
>>>
>>> On Wed, Jul 29, 2020 at 4:14 PM Valentyn Tymofieiev 
>>> wrote:
>>>
>>>> +1, Thanks Daniel!
>>>>
>>>> On Wed, Jul 29, 2020 at 4:04 PM Daniel Oliveira 
>>>> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> The next Beam release branch (2.24.0) is scheduled to be cut on August
>>>>> 12 according to the release calendar [1].
>>>>>
>>>>> I'd like to volunteer to handle this release. Following the lead of
>>>>> previous release managers, I plan on cutting the branch on that date and
>>>>> cherrypicking in release-blocking fixes afterwards. So unresolved release
>>>>> blocking JIRA issues should have their "Fix Version/s" marked as "2.23.0".
>>>>>
>>>> You probably meant 2.24.0 [1].
>>>>
>>>>
>>>>> Any comments or objections?
>>>>>
>>>>> Thanks,
>>>>> Daniel Oliveira
>>>>>
>>>>> [1]
>>>>> https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com
>>>>>
>>>> [1] https://issues.apache.org/jira/projects/BEAM/versions/12347146
>>>>
>>>


Re: [PROPOSAL] Preparing for Beam 2.24.0 release

2020-07-29 Thread Daniel Oliveira
> You probably meant 2.24.0.

Thanks, yes I did. Mark "Fix Version/s" as "2.24.0" everyone. :)

On Wed, Jul 29, 2020 at 4:14 PM Valentyn Tymofieiev 
wrote:

> +1, Thanks Daniel!
>
> On Wed, Jul 29, 2020 at 4:04 PM Daniel Oliveira 
> wrote:
>
>> Hi everyone,
>>
>> The next Beam release branch (2.24.0) is scheduled to be cut on August 12
>> according to the release calendar [1].
>>
>> I'd like to volunteer to handle this release. Following the lead of
>> previous release managers, I plan on cutting the branch on that date and
>> cherrypicking in release-blocking fixes afterwards. So unresolved release
>> blocking JIRA issues should have their "Fix Version/s" marked as "2.23.0".
>>
> You probably meant 2.24.0 [1].
>
>
>> Any comments or objections?
>>
>> Thanks,
>> Daniel Oliveira
>>
>> [1]
>> https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com
>>
> [1] https://issues.apache.org/jira/projects/BEAM/versions/12347146
>


[PROPOSAL] Preparing for Beam 2.24.0 release

2020-07-29 Thread Daniel Oliveira
Hi everyone,

The next Beam release branch (2.24.0) is scheduled to be cut on August 12
according to the release calendar [1].

I'd like to volunteer to handle this release. Following the lead of
previous release managers, I plan on cutting the branch on that date and
cherrypicking in release-blocking fixes afterwards. So unresolved release
blocking JIRA issues should have their "Fix Version/s" marked as "2.23.0".

Any comments or objections?

Thanks,
Daniel Oliveira

[1]
https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com


Re: Go error when building containers

2020-07-23 Thread Daniel Oliveira
It looks like the cached version of a package is stale and causing build
errors when building beam. Chances are just deleting that
/.gogradle directory will cause everything to rebuild from a clean state,
so I'd try that. I think it should be /sdks/go/.gogradle

On Thu, Jul 23, 2020 at 4:56 PM Ahmet Altay  wrote:

> This is probably : https://issues.apache.org/jira/browse/BEAM-10567
> 
>
> On Thu, Jul 23, 2020 at 4:53 PM Brian Hulette  wrote:
>
>> Whenever I build a container locally
>> (:sdks:java:container:docker, :sdks:python:container:py37:docker, ..) I get
>> a Go error (log at the end of this message).
>>
>> I've discovered I can just comment out resolveBuildDependencies.dependsOn
>> ":sdks:go:goBuild" in the relevant build.gradle file [1] whenever this
>> happens, but it's getting old and I'm wondering if there's a better way. Is
>> there something wrong with my environment that's causing these errors (It
>> must not be an actual breakage in the Go SDK)? Can we remove or modify this
>> statement to fix this?
>>
>> Thanks,
>> Brian
>>
>> [1]
>> https://github.com/apache/beam/blob/59b7200c8621b81804d53ded771fd3aa525fbb47/sdks/java/container/build.gradle#L32
>>
>> # github.com/apache/beam/sdks/go/test/integration/synthetic
>> .gogradle/project_gopath/src/
>> github.com/apache/beam/sdks/go/test/integration/synthetic/synthetic.go:31:31:
>> cannot use s (type "
>> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam".Scope)
>> as type "github.com/apache/beam/sdks/go/pkg/beam".Scope in argument to
>> synthetic.SourceSingle
>> .gogradle/project_gopath/src/
>> github.com/apache/beam/sdks/go/test/integration/synthetic/synthetic.go:33:24:
>> cannot use s (type "
>> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam".Scope)
>> as type "github.com/apache/beam/sdks/go/pkg/beam".Scope in argument to
>> synthetic.Step
>> .gogradle/project_gopath/src/
>> github.com/apache/beam/sdks/go/test/integration/synthetic/synthetic.go:34:2:
>> undefined: passert.Count
>> .gogradle/project_gopath/src/
>> github.com/apache/beam/sdks/go/test/integration/synthetic/synthetic.go:51:25:
>> cannot use s (type "
>> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam".Scope)
>> as type "github.com/apache/beam/sdks/go/pkg/beam".Scope in argument to
>> synthetic.Source
>> .gogradle/project_gopath/src/
>> github.com/apache/beam/sdks/go/test/integration/synthetic/synthetic.go:51:25:
>> cannot use configs (type "
>> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam".PCollection)
>> as type "github.com/apache/beam/sdks/go/pkg/beam".PCollection in
>> argument to synthetic.Source
>> .gogradle/project_gopath/src/
>> github.com/apache/beam/sdks/go/test/integration/synthetic/synthetic.go:52:24:
>> cannot use s (type "
>> github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam".Scope)
>> as type "github.com/apache/beam/sdks/go/pkg/beam".Scope in argument to
>> synthetic.Step
>> .gogradle/project_gopath/src/
>> github.com/apache/beam/sdks/go/test/integration/synthetic/synthetic.go:61:2:
>> undefined: passert.Count
>>
>> > Task :sdks:go:buildLinuxAmd64 FAILED
>>
>


Re: Is this SO question showing a bug in Java Reshuffle? Can someone take a look?

2020-05-29 Thread Daniel Oliveira
I asked the user to check if it was just the GBK or the entire Reshuffle,
and they confirmed it was the entire Reshuffle. Also their pipeline did
ultimately not have everything that was expected to be output. I'm still
asking the user for more info to make sure this isn't a bug on the Dataflow
side.

On Fri, May 29, 2020 at 4:32 PM Robert Bradshaw  wrote:

> Reshuffle should be emitting exactly the same number of elements that it
> gets. The GBK inside Reshuffle may have slightly less due to key
> collisions, but the ExpandIterable step should take care of this. Do we
> have counts for that output? (I will say that seem to be an
> extraordinarily high number of collisions.)
>
> On Fri, May 29, 2020 at 3:34 PM Daniel Oliveira 
> wrote:
>
>> Hi dev list,
>>
>> While answering Stack Overflow questions I stumbled onto this:
>> https://stackoverflow.com/questions/62017572/beam-java-dataflow-bigquery-streaming-insert-groupbykey-reducing-elements
>>
>> The user's pipeline seems to have a Reshuffle outputting less elements
>> than it received, inside a BigQuery streaming insert. This looks like a bug
>> to me since I assume Reshuffle should always be outputting unchanged
>> elements, and I read through the code and as far as I can tell this
>> shouldn't be happening. But I'm not too familiar with the code in question
>> so I was hoping someone else with more context on it could help confirm.
>>
>> Thanks,
>> Daniel Oliveira
>>
>


Is this SO question showing a bug in Java Reshuffle? Can someone take a look?

2020-05-29 Thread Daniel Oliveira
Hi dev list,

While answering Stack Overflow questions I stumbled onto this:
https://stackoverflow.com/questions/62017572/beam-java-dataflow-bigquery-streaming-insert-groupbykey-reducing-elements

The user's pipeline seems to have a Reshuffle outputting less elements than
it received, inside a BigQuery streaming insert. This looks like a bug to
me since I assume Reshuffle should always be outputting unchanged elements,
and I read through the code and as far as I can tell this shouldn't be
happening. But I'm not too familiar with the code in question so I was
hoping someone else with more context on it could help confirm.

Thanks,
Daniel Oliveira


Re: [VOTE + INPUT] Beam Mascot Designs, 2nd iteration - Deadline Friday, March 27

2020-03-25 Thread Daniel Oliveira
>
> 1. Do you prefer red or black colored line art?


Red.


> 2. Do you have any additional feedback about the mascot's shape or
> features?


Love the new tail and new shadows.

I like the wings better with color, but they still feel a bit dull to me. I
feel they would be improved by having more vibrant colors near the tips,
and possibly by going with more yellow-ish colors closer to the Beam logo.
Compare with the wings from slide 10 of your previous deck
,
which I like much better. Having the more vibrant color near the tips of
the wings also pairs well with the new tail, which does the same thing with
its yellow light.

On Wed, Mar 25, 2020 at 12:11 PM Julian Bruno 
wrote:

> Hello Apache Beam Community,
>
> Together with Aizhamal and her team, we have been working on the design of
> the Apache Beam mascot.
>
> We now need input from the community to continue moving forward with the
> design. Please share your input no later than Friday, March 27, at noon
> Pacific Time. Below you will find a link to the presentation of the work
> process and we are eager to know what you think of the current design [1].
>
> Our questions to you:
>
> 1. Do you prefer red or black colored line art?
>
> 2. Do you have any additional feedback about the mascot's shape or
> features?
>
> Please reply inline, so it is clear what exactly you are referring to. The
> vote and input phase will be open until Friday, March 27, at 12 pm Pacific
> Time. We will incorporate the feedback to the next design iteration of
> the mascot.
>
> Thank you,
>
>
> Julian Bruno // Visual Artist & Graphic Designer
>  (510) 367-0551 / SF Bay Area, CA
> www.instagram.com/julbro.art
>
> [1]
>
>  Mascot Weekly Update - 3/25
> 
>
>
>
> ᐧ
>


Updating releases on Github release page.

2020-02-07 Thread Daniel Oliveira
Hey beam devs,

I saw a comment on SO that our releases on github (
https://github.com/apache/beam/releases) are stuck at 2.16.0. It looks like
that's still tagged as the "Latest Release", but the newer releases are
actually present in tiny words above it: "... Show 7 newer tags".

I wanted to fix this, but I'm not sure if it's intentional, and I have no
clue how to do so and am worried about messing something up. Anyone know
how to fix it? And do we need to add that step to release instructions for
the future?

Thanks,
Daniel Oliveira


Re: Go SplittableDoFn prototype and proposed changes

2020-01-27 Thread Daniel Oliveira
As a follow-up to the proposed changes from my first email, I've worked on
a doc with a more detailed changelist, including details still up for
discussion:
https://docs.google.com/document/d/1UeG5uNO00xCByGEZzDXk0m0LghX6HBWlMfRbMv_Xiyc/edit?usp=sharing

The doc is mostly full of my brainstorming on what the next version of the
user-facing Go SDF API will look like, so it's not too polished. But if
anyone's interested in this, I welcome any and all feedback!

On Mon, Jan 13, 2020 at 2:22 PM Luke Cwik  wrote:

> Thanks for the update and I agree with the points that you have made.
>
> On Fri, Jan 10, 2020 at 5:58 PM Robert Burke  wrote:
>
>> Thank you for sharing Daniel!
>>
>> Resolving SplittableDoFns for the Go SDK even just as far as initial
>> splitting will take the SDK that much closer to exiting its experimental
>> status.
>>
>> It's especially exciting seeing this work on Flink and on the Python
>> direct runner!
>>
>> On Fri, Jan 10, 2020, 5:36 PM Daniel Oliveira 
>> wrote:
>>
>>> Hey Beam devs,
>>>
>>> So several months ago I posted my Go SDF proposal and got a lot of good
>>> feedback (thread
>>> <https://lists.apache.org/thread.html/327bc72a0b30e18c6152b562bac2613c0edc942465d67b215830819e%40%3Cdev.beam.apache.org%3E>,
>>> doc <https://s.apache.org/beam-go-sdf>). Since then I've been working
>>> on implementing it and I've got an initial prototype ready to show off! It
>>> works with initial splitting on Flink, and has a decently documented API.
>>> Also in the second part of the email I'll also be proposing changes to the
>>> original doc, based on my experience working on this prototype.
>>>
>>> To be clear, this is *not* ready to officially go into Beam yet; the
>>> API is still likely to go through changes. Rather, I'm showing this off to
>>> show that progress is being made on SDFs, and to provide some context to
>>> the changes I'll be proposing below.
>>>
>>> Here's a link to the repo and branch so you can download it, and a link
>>> to the changes specifically:
>>> Repo: https://github.com/youngoli/beam/tree/gosdf
>>> Changes:
>>> https://github.com/apache/beam/commit/28140ee3471d6cb80e74a16e6fd108cc380d4831
>>>
>>> If you give it a try and have any thoughts, please let me know! I'm open
>>> to any and all feedback.
>>>
>>> ==
>>>
>>> Proposed Changes
>>> Doc: https://s.apache.org/beam-go-sdf (Select "Version 1" from version
>>> history.)
>>>
>>> For anyone reading this who hasn't already read the doc above, I suggest
>>> reading it first, since I'll be referring to concepts from it.
>>>
>>> After working on the prototype I've changed my mind on the original
>>> decisions to go with an interface approach and a combined restriction +
>>> tracker. But I don't want to go all in and create another doc with a
>>> detailed proposal, so I've laid out a brief summary of the changes to get
>>> some initial feedback before I go ahead and start working on these changes
>>> in detail. Please let me know what you think!
>>>
>>> *1. Change from native Go interfaces to dynamic reflection-based API.*
>>>
>>> Instead of the native Go interfaces (SplittableDoFn, RProvider, and
>>> RTracker) described in the doc and implemented in the prototype, use the
>>> same dynamic approach that the Go SDK already uses for DoFns: Use the
>>> reflection system to examine the names and signatures of methods in the
>>> user's DoFn, RProvider, and RTracker.
>>>
>>> Original approach reasoning:
>>>
>>>- Simpler, so faster to implement and less bug-prone.
>>>- The extra burden on the user to keep types consistent is ok since
>>>most users of SDFs are more advanced
>>>
>>> Change reasoning:
>>>
>>>- In the prototype, I found interfaces to require too much extra
>>>boilerplate which added more complexity than expected. (Examples: 
>>> Constant
>>>casting,
>>>- More consistent API: Inconsistency between regular DoFns (dynamic)
>>>and SDF API (interfaces) was jarring and unintuitive when implementing 
>>> SDFs
>>>as a user.
>>>
>>> Implementation: Full details are up for discussion, but the goal is to
>>> make the RProvider and  RTracker interfaces dynamic, so we can

Go SplittableDoFn prototype and proposed changes

2020-01-10 Thread Daniel Oliveira
Hey Beam devs,

So several months ago I posted my Go SDF proposal and got a lot of good
feedback (thread
<https://lists.apache.org/thread.html/327bc72a0b30e18c6152b562bac2613c0edc942465d67b215830819e%40%3Cdev.beam.apache.org%3E>,
doc <https://s.apache.org/beam-go-sdf>). Since then I've been working on
implementing it and I've got an initial prototype ready to show off! It
works with initial splitting on Flink, and has a decently documented API.
Also in the second part of the email I'll also be proposing changes to the
original doc, based on my experience working on this prototype.

To be clear, this is *not* ready to officially go into Beam yet; the API is
still likely to go through changes. Rather, I'm showing this off to show
that progress is being made on SDFs, and to provide some context to the
changes I'll be proposing below.

Here's a link to the repo and branch so you can download it, and a link to
the changes specifically:
Repo: https://github.com/youngoli/beam/tree/gosdf
Changes:
https://github.com/apache/beam/commit/28140ee3471d6cb80e74a16e6fd108cc380d4831

If you give it a try and have any thoughts, please let me know! I'm open to
any and all feedback.

==

Proposed Changes
Doc: https://s.apache.org/beam-go-sdf (Select "Version 1" from version
history.)

For anyone reading this who hasn't already read the doc above, I suggest
reading it first, since I'll be referring to concepts from it.

After working on the prototype I've changed my mind on the original
decisions to go with an interface approach and a combined restriction +
tracker. But I don't want to go all in and create another doc with a
detailed proposal, so I've laid out a brief summary of the changes to get
some initial feedback before I go ahead and start working on these changes
in detail. Please let me know what you think!

*1. Change from native Go interfaces to dynamic reflection-based API.*

Instead of the native Go interfaces (SplittableDoFn, RProvider, and
RTracker) described in the doc and implemented in the prototype, use the
same dynamic approach that the Go SDK already uses for DoFns: Use the
reflection system to examine the names and signatures of methods in the
user's DoFn, RProvider, and RTracker.

Original approach reasoning:

   - Simpler, so faster to implement and less bug-prone.
   - The extra burden on the user to keep types consistent is ok since most
   users of SDFs are more advanced

Change reasoning:

   - In the prototype, I found interfaces to require too much extra
   boilerplate which added more complexity than expected. (Examples: Constant
   casting,
   - More consistent API: Inconsistency between regular DoFns (dynamic) and
   SDF API (interfaces) was jarring and unintuitive when implementing SDFs as
   a user.

Implementation: Full details are up for discussion, but the goal is to make
the RProvider and  RTracker interfaces dynamic, so we can replace all
instances of interface{} in the methods with the actual element types (i.e.
fake generics). Also uses of the RProvider and RTracker interfaces in
signatures can be replaced with the implementations of those
providers/trackers. This will require a good amount of additional work in
the DoFn validation codebase and the code generator. Plus a fair amount of
additional user code validation will be needed and more testing since the
new code is more complex.

*2. Seperate the restriction tracker and restriction.*

Currently the API has the restriction combined with the tracker. In most
other SDKs and within the SDF model, the two are usually separate concepts,
and this change is to follow that approach and split the two.

Original approach reasoning:

   - It was considered simpler to avoid another level of type casting in
   the API with the interface approach.

Change reasoning:

   - We are no longer going with the interface approach. With "fake
   generics", it is simpler to keep the two concepts separate.
   - Requiring users to specify custom coders in order to only encode the
   restriction and not the tracker ended up adding additional complexity
   anyway.

Implementation: In the API have the restriction tracker initialized with a
restriction object accessible via a getter. The restriction itself will be
the only thing serialized, so it will be wrapped and unwrapped with the
tracker before the user code is invoked. This wouldn't add very little work
as it would mostly be bundled with the interface->dynamic approach change.


Thanks,
Daniel Oliveira


Re: [VOTE] Beam's Mascot will be the Firefly (Lampyridae)

2019-12-16 Thread Daniel Oliveira
+1 (non-binding)

On Sat, Dec 14, 2019 at 5:24 PM Kyle Weaver  wrote:

> +1 (non-binding)
>
> On Sat, Dec 14, 2019 at 3:10 AM Jan Lukavský  wrote:
>
>> +1 (non-binding)
>> On 12/13/19 7:22 PM, Pablo Estrada wrote:
>>
>> +1 (binding)
>>
>> On Fri, Dec 13, 2019 at 8:47 AM Maximilian Michels 
>> wrote:
>>
>>> +1 (binding)
>>>
>>> On 13.12.19 17:10, Jeff Klukas wrote:
>>> > +1 (non-binding)
>>> >
>>> > On Thu, Dec 12, 2019 at 11:58 PM Kenneth Knowles >> > > wrote:
>>> >
>>> > Please vote on the proposal for Beam's mascot to be the Firefly.
>>> > This encompasses the Lampyridae family of insects, without
>>> > specifying a genus or species.
>>> >
>>> > [ ] +1, Approve Firefly being the mascot
>>> > [ ] -1, Disapprove Firefly being the mascot
>>> >
>>> > The vote will be open for at least 72 hours excluding weekends. It
>>> > is adopted by at least 3 PMC +1 approval votes, with no PMC -1
>>> > disapproval votes*. Non-PMC votes are still encouraged.
>>> >
>>> > PMC voters, please help by indicating your vote as "(binding)"
>>> >
>>> > Kenn
>>> >
>>> > *I have chosen this format for this vote, even though Beam uses
>>> > simple majority as a rule, because I want any PMC member to be able
>>> > to veto based on concerns about overlap or trademark.
>>> >
>>>
>>


Re: [VOTE] Beam Mascot animal choice: vote for as many as you want

2019-11-25 Thread Daniel Oliveira
I'm also a bit late to the party.

[ ] Beaver
[ ] Hedgehog
[X] Lemur
[X] Owl
[ ] Salmon
[ ] Trout
[ ] Robot dinosaur
[X] Firefly
[X] Cuttlefish
[X] Dumbo Octopus
[ ] Angler fish

On Sun, Nov 24, 2019 at 8:37 AM Matthias Baetens 
wrote:

> In case I'm not too late:
>
> [ ] Beaver
> [ ] Hedgehog
> [ ] Lemur
> [ ] Owl
> [ ] Salmon
> [ ] Trout
> [ ] Robot dinosaur
> [X ] Firefly
> [ ] Cuttlefish
> [X ] Dumbo Octopus
> [ ] Angler fish
>
> I like angler fish a lot, but I think no one will join any meetups since
> they're scary as hell haha
>
>
> On Sun, Nov 24, 2019, 04:27 Kenneth Knowles  wrote:
>
>> David - if you can reconfigure the form so it is not anonymous (at least
>> to me) then I may be up for including those results in the tally. I don't
>> want to penalize those who voted via the form. But since there are now two
>> voting channels we have to dedupe or discard the form results. And I need
>> to be able to see which votes are PMC. Even if advisory, it does need to
>> move to a concluding vote, and PMC votes could be a tiebreaker of sorts.
>>
>> Kenn
>>
>> On Sat, Nov 23, 2019 at 7:17 PM Kenneth Knowles  wrote:
>>
>>> On Fri, Nov 22, 2019 at 10:24 AM Robert Bradshaw 
>>> wrote:
>>>
 On Thu, Nov 21, 2019 at 7:05 PM David Cavazos 
 wrote:

>
>
> I created this Google Form
> 
> if everyone is okay with it to make it easier to both vote and view the
> results :)
>

 Generally decisions, especially votes, for apache projects are supposed
 to happen on-list. I suppose this is more an advisory vote, but still
 probably makes sense to keep it here. .

>>>
>>> Indeed. Someone suggested a Google form before I started this, but I
>>> deliberately didn't use it. It doesn't add much and it puts the vote off
>>> list onto opaque and mutable third party infrastructure.
>>>
>>> If you voted on the form, please repeat it on thread so I can count it.
>>>
>>> Kenn
>>>
>>>
>>>
>>> import collections, pprint, re, requests
 thread = requests.get('
 https://lists.apache.org/api/thread.lua?id=ff60eabbf8349ba6951633869000356c2c2feb48bbff187cf3c60039@%3Cdev.beam.apache.org%3E').json(
 )
 counts = collections.defaultdict(int)
 for email in thread['emails']:
   body = requests.get('https://lists.apache.org/api/email.lua?id=%s' %
 email['mid']).json()['body']
   for vote in re.findall(r'\n\s*\[\s*[xX]\s*\]\s*([a-zA-Z ]+)', body):
 counts[vote] += 1
   pprint.pprint(sorted(counts.items(), key=lambda kv: kv[-1]))

 ...

 [('Beaver', 1),

  ('Capybara', 2),

  ('Trout', 2),

  ('Salmon', 4),

  ('Dumbo Octopus', 7),

  ('Robot dinosaur', 9),

  ('Hedgehog', 10),

  ('Cuttlefish', 11),

  ('Angler fish', 12),

  ('Lemur', 14),

  ('Owl', 15),

  ('Firefly', 17)]



>
> On Thu, Nov 21, 2019 at 6:18 PM Vinay Mayar <
> vinay.ma...@expanseinc.com> wrote:
>
>> [ ] Beaver
>> [ ] Hedgehog
>> [ ] Lemur
>> [ ] Owl
>> [ ] Salmon
>> [ ] Trout
>> [ ] Robot dinosaur
>> [ ] Firefly
>> [ ] Cuttlefish
>> [x] Dumbo Octopus
>> [ ] Angler fish
>>
>> On Thu, Nov 21, 2019 at 6:14 PM Chamikara Jayalath <
>> chamik...@google.com> wrote:
>>
>>> [X] Beaver
>>> [ ] Hedgehog
>>> [ ] Lemur
>>> [X] Owl
>>> [ ] Salmon
>>> [ ] Trout
>>> [ ] Robot dinosaur
>>> [ ] Firefly
>>> [X ] Cuttlefish
>>> [X ] Dumbo Octopus
>>> [ X] Angler fish
>>>
>>> Thanks,
>>> Cham
>>>
>>> On Thu, Nov 21, 2019 at 1:43 PM Michał Walenia <
>>> michal.wale...@polidea.com> wrote:
>>>
 [X] Beaver
 [ ] Hedgehog
 [X] Lemur
 [X] Owl
 [ ] Salmon
 [ ] Trout
 [X] Robot dinosaur
 [X] Firefly
 [ ] Cuttlefish
 [ ] Dumbo Octopus
 [ ] Angler fish

 On Thu, Nov 21, 2019 at 1:11 PM Aizhamal Nurmamat kyzy <
 aizha...@apache.org> wrote:

> [ ] Beaver
> [X] Hedgehog
> [ ] Lemur
> [ ] Owl
> [ ] Salmon
> [ ] Trout
> [ ] Robot dinosaur
> [ ] Firefly
> [X] Cuttlefish
> [ ] Dumbo Octopus
> [ ] Angler fish
>
> On Thu, Nov 21, 2019 at 11:21 AM Robert Burke 
> wrote:
>
>> [ X] Beaver
>> [] Hedgehog
>> [ x] Lemur
>> [ X] Owl
>> [ ] Salmon
>> [ ] Trout
>> [ ] Robot dinosaur
>> [X ] Firefly
>> [ X] Cuttlefish
>> [x ] Dumbo Octopus
>> [X ] Angler fish
>>
>> On Thu, Nov 21, 2019, 9:33 AM Łukasz Gajowy <
>> lukasz.gaj...@gmail.com> wrote:
>>
>>> [ ] Beaver
>>> [ ] Hedgehog
>

Re: [ANNOUNCE] New committer: Daniel Oliveira

2019-11-20 Thread Daniel Oliveira
Thank you everyone! I won't let you down. o7

On Wed, Nov 20, 2019 at 2:12 PM Ruoyun Huang  wrote:

> Congrats Daniel!
>
> On Wed, Nov 20, 2019 at 1:58 PM Robert Burke  wrote:
>
>> Congrats Daniel! Much deserved.
>>
>> On Wed, Nov 20, 2019, 12:49 PM Udi Meiri  wrote:
>>
>>> Congrats Daniel!
>>>
>>> On Wed, Nov 20, 2019 at 12:42 PM Kyle Weaver 
>>> wrote:
>>>
>>>> Congrats Dan! Keep up the good work :)
>>>>
>>>> On Wed, Nov 20, 2019 at 12:41 PM Cyrus Maden  wrote:
>>>>
>>>>> Congratulations! This is great news.
>>>>>
>>>>> On Wed, Nov 20, 2019 at 3:24 PM Rui Wang  wrote:
>>>>>
>>>>>> Congrats!
>>>>>>
>>>>>>
>>>>>> -Rui
>>>>>>
>>>>>> On Wed, Nov 20, 2019 at 11:48 AM Valentyn Tymofieiev <
>>>>>> valen...@google.com> wrote:
>>>>>>
>>>>>>> Congrats, Daniel!
>>>>>>>
>>>>>>> On Wed, Nov 20, 2019 at 11:47 AM Kenneth Knowles 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> Please join me and the rest of the Beam PMC in welcoming a new
>>>>>>>> committer: Daniel Oliveira
>>>>>>>>
>>>>>>>> Daniel introduced himself to dev@ over two years ago and has
>>>>>>>> contributed in many ways since then. Daniel has contributed to general
>>>>>>>> project health, the portability framework, and all three languages: 
>>>>>>>> Java,
>>>>>>>> Python SDK, and Go. I would like to particularly highlight how he 
>>>>>>>> deleted
>>>>>>>> 12k lines of dead reference runner code [1].
>>>>>>>>
>>>>>>>> In consideration of Daniel's contributions, the Beam PMC trusts him
>>>>>>>> with the responsibilities of a Beam committer [2].
>>>>>>>>
>>>>>>>> Thank you, Daniel, for your contributions and looking forward to
>>>>>>>> many more!
>>>>>>>>
>>>>>>>> Kenn, on behalf of the Apache Beam PMC
>>>>>>>>
>>>>>>>> [1] https://github.com/apache/beam/pull/8380
>>>>>>>> [2]
>>>>>>>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>>>>>>>
>>>>>>>
>
> --
> 
> Ruoyun  Huang
>
>


Re: [ANNOUNCE] New committer: Brian Hulette

2019-11-15 Thread Daniel Oliveira
Congratulations Brian! It's well deserved.

On Fri, Nov 15, 2019, 9:37 AM Alexey Romanenko 
wrote:

> Congratulations, Brian!
>
> On 15 Nov 2019, at 18:27, Rui Wang  wrote:
>
> Congrats!
>
>
> -Rui
>
> On Fri, Nov 15, 2019 at 8:16 AM Thomas Weise  wrote:
>
>> Congratulations!
>>
>>
>> On Fri, Nov 15, 2019 at 6:34 AM Connell O'Callaghan 
>> wrote:
>>
>>> Well done Brian!!!
>>>
>>> Kenn thank you for sharing
>>>
>>> On Fri, Nov 15, 2019 at 6:31 AM Cyrus Maden  wrote:
>>>
 Congrats Brian!

 On Fri, Nov 15, 2019 at 5:25 AM Ismaël Mejía  wrote:

> Congratulations Brian!
> Happy to see this happening and eager to see more of your work!
>
> On Fri, Nov 15, 2019 at 11:02 AM Ankur Goenka 
> wrote:
> >
> > Congrats Brian!
> >
> > On Fri, Nov 15, 2019, 2:42 PM Jan Lukavský  wrote:
> >>
> >> Congrats Brian!
> >>
> >> On 11/15/19 9:58 AM, Reza Rokni wrote:
> >>
> >> Great news!
> >>
> >> On Fri, 15 Nov 2019 at 15:09, Gleb Kanterov 
> wrote:
> >>>
> >>> Congratulations!
> >>>
> >>> On Fri, Nov 15, 2019 at 5:44 AM Valentyn Tymofieiev <
> valen...@google.com> wrote:
> 
>  Congratulations, Brian!
> 
>  On Thu, Nov 14, 2019 at 6:25 PM jincheng sun <
> sunjincheng...@gmail.com> wrote:
> >
> > Congratulation Brian!
> >
> > Best,
> > Jincheng
> >
> > Kyle Weaver  于2019年11月15日周五 上午7:19写道:
> >>
> >> Thanks for your contributions and congrats Brian!
> >>
> >> On Thu, Nov 14, 2019 at 3:14 PM Kenneth Knowles <
> k...@apache.org> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> Please join me and the rest of the Beam PMC in welcoming a new
> committer: Brian Hulette
> >>>
> >>> Brian introduced himself to dev@ earlier this year and has
> been contributing since then. His contributions to Beam include
> explorations of integration with Arrow, standardizing coders, portability
> for schemas, and presentations at Beam events.
> >>>
> >>> In consideration of Brian's contributions, the Beam PMC trusts
> him with the responsibilities of a Beam committer [1].
> >>>
> >>> Thank you, Brian, for your contributions and looking forward
> to many more!
> >>>
> >>> Kenn, on behalf of the Apache Beam PMC
> >>>
> >>> [1]
> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
> >>
> >>
> >>
> >> --
> >>
> >> This email may be confidential and privileged. If you received this
> communication by mistake, please don't forward it to anyone else, please
> erase all copies and attachments, and please let me know that it has gone
> to the wrong person.
> >>
> >> The above terms reflect a potential business arrangement, are
> provided solely as a basis for further discussion, and are not intended to
> be and do not constitute a legally binding obligation. No legally binding
> obligations will be created, implied, or inferred until an agreement in
> final form is executed in writing by all parties involved.
>

>


Re: Jenkins queue times steadily increasing for a few months now

2019-09-24 Thread Daniel Oliveira
Those ideas all sound good. I especially agree with trying to reduce tests
first and then if we've done all we can there and latency is still too
high, it means we need more workers. Also in addition to reducing the
amount of tests, there's also running less important tests less frequently,
particularly when it comes to postcommits since many of those are resource
intensive. That would require people with good context around what our many
postcommits are used for.

Another idea I thought of is trying to avoid running automated tests
outside of peak coding times. Ideally, during the times when we get the
greatest amounts of PRs (and therefore precommits) we shouldn't have any
postcommits running. If we have both pre and postcommits going at the same
time during peak hours, our queue times will shoot up even if the total
amount of work doesn't change much.

Btw, you mentioned that this was a problem last year. Do you have any links
to discussions about that? It seems like it could be useful.

On Thu, Sep 19, 2019 at 1:10 PM Mikhail Gryzykhin  wrote:

> Hi Daniel,
>
> Generally this looks feasible since jobs wait for new worker to be
> available to start.
>
> Over time we added more tests and did not deprecate enough, this increases
> load on workers. I wonder if we can add something like total runtime of all
> running jobs? This will be a safeguard metric that will show amount of time
> we actually run jobs. If it increases with same amount of workers, that
> will prove that we are overloading them (inverse is not necessarily
> correct).
>
> On addressing this, we can review approaches we took last year and see if
> any of them apply. If I do some brainstorming, following ideas come to
> mind: add more work force, reduce amount of tests, do better work on
> filtering out irrelevant tests, cancel irrelevant jobs (ie: cancel tests if
> linter fails) and/or add option for cancelling irrelevant jobs. One more
> big point can be effort on deflaking, but we seem to be decent in this area.
>
> Regards,
> Mikhail.
>
>
> On Thu, Sep 19, 2019 at 12:22 PM Daniel Oliveira 
> wrote:
>
>> Hi everyone,
>>
>> A little while ago I was taking a look at the Precommit Latency metrics
>> on Grafana (link
>> <http://104.154.241.245/d/_TNndF2iz/pre-commit-test-latency?orgId=1&from=now-90d&to=now>)
>> and saw that the monthly 90th percentile metric has been really increasing
>> the past few months, from around 10 minutes to currently around 30 minutes.
>>
>> After doing some light digging I was shown this page (beam load
>> statistics
>> <https://builds.apache.org/label/beam/load-statistics?type=min>) which
>> seems to imply that queue times are shooting up when all the test executors
>> are occupied, and it seems this is happening longer and more often
>> recently. I also took a look at the commit history for our Jenkins tests
>> <https://github.com/apache/beam/commits/master?after=864e2e0cac88ee317ca600dafe31ec4f527d5d5f+34&path%5B%5D=.test-infra&path%5B%5D=jenkins>
>>  and
>> I see that new tests have steadily been added.
>>
>> I wanted to bring this up with the dev@ to ask:
>>
>> 1. Is this accurate? Can anyone provide insight into the metrics? Does
>> anyone know how to double check my assumptions with more concrete metrics?
>>
>> 2. Does anyone have ideas on how to address this?
>>
>> Thanks,
>> Daniel Oliveira
>>
>


Jenkins queue times steadily increasing for a few months now

2019-09-19 Thread Daniel Oliveira
Hi everyone,

A little while ago I was taking a look at the Precommit Latency metrics on
Grafana (link
<http://104.154.241.245/d/_TNndF2iz/pre-commit-test-latency?orgId=1&from=now-90d&to=now>)
and saw that the monthly 90th percentile metric has been really increasing
the past few months, from around 10 minutes to currently around 30 minutes.

After doing some light digging I was shown this page (beam load statistics
<https://builds.apache.org/label/beam/load-statistics?type=min>) which
seems to imply that queue times are shooting up when all the test executors
are occupied, and it seems this is happening longer and more often
recently. I also took a look at the commit history for our Jenkins tests
<https://github.com/apache/beam/commits/master?after=864e2e0cac88ee317ca600dafe31ec4f527d5d5f+34&path%5B%5D=.test-infra&path%5B%5D=jenkins>
and
I see that new tests have steadily been added.

I wanted to bring this up with the dev@ to ask:

1. Is this accurate? Can anyone provide insight into the metrics? Does
anyone know how to double check my assumptions with more concrete metrics?

2. Does anyone have ideas on how to address this?

Thanks,
Daniel Oliveira


Re: [ANNOUNCE] New committer: Valentyn Tymofieiev

2019-08-27 Thread Daniel Oliveira
Congratulations Valentyn!

On Tue, Aug 27, 2019, 11:31 AM Boyuan Zhang  wrote:

> Congratulations!
>
> On Tue, Aug 27, 2019 at 10:44 AM Udi Meiri  wrote:
>
>> Congrats!
>>
>> On Tue, Aug 27, 2019 at 9:50 AM Yichi Zhang  wrote:
>>
>>> Congrats Valentyn!
>>>
>>> On Tue, Aug 27, 2019 at 7:55 AM Valentyn Tymofieiev 
>>> wrote:
>>>
 Thank you everyone!

 On Tue, Aug 27, 2019 at 2:57 AM Alexey Romanenko <
 aromanenko@gmail.com> wrote:

> Congrats, well deserved!
>
> On 27 Aug 2019, at 11:25, Jan Lukavský  wrote:
>
> Congrats Valentyn!
> On 8/26/19 11:43 PM, Rui Wang wrote:
>
> Congratulations!
>
>
> -Rui
>
> On Mon, Aug 26, 2019 at 2:36 PM Hannah Jiang 
> wrote:
>
>> Congratulations Valentyn, well deserved!
>>
>> On Mon, Aug 26, 2019 at 2:34 PM Chamikara Jayalath <
>> chamik...@google.com> wrote:
>>
>>> Congrats Valentyn!
>>>
>>> On Mon, Aug 26, 2019 at 2:32 PM Pablo Estrada 
>>> wrote:
>>>
 Thanks Valentyn!

 On Mon, Aug 26, 2019 at 2:29 PM Robin Qiu 
 wrote:

> Thank you Valentyn! Congratulations!
>
> On Mon, Aug 26, 2019 at 2:28 PM Robert Bradshaw <
> rober...@google.com> wrote:
>
>> Hi,
>>
>> Please join me and the rest of the Beam PMC in welcoming a new
>> committer: Valentyn Tymofieiev
>>
>> Valentyn has made numerous contributions to Beam over the last
>> several
>> years (including 100+ pull requests), most recently pushing
>> through
>> the effort to make Beam compatible with Python 3. He is also an
>> active
>> participant in design discussions on the list, participates in
>> release
>> candidate validation, and proactively helps keep our tests green.
>>
>> In consideration of Valentyn's contributions, the Beam PMC trusts
>> him
>> with the responsibilities of a Beam committer [1].
>>
>> Thank you, Valentyn, for your contributions and looking forward
>> to many more!
>>
>> Robert, on behalf of the Apache Beam PMC
>>
>> [1]
>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>
>
>


Proposal for SDFs in the Go SDK

2019-08-07 Thread Daniel Oliveira
Hello Beam devs,

I've been working on a proposal for implementing SDFs in the Go SDK. For
those who were unaware, the Go SDK hasn't supported SDFs in any capacity
yet, so my proposal covers the user-facing API and a basic look into how it
will work under the hood.

I'd appreciate it if anyone interested in the Go SDK or anyone who's been
working with portable SDFs could give it a look and provide some feedback.
There's still a few open questions mentioned in the doc that I'd like to
get feedback on before deciding on anything.

https://docs.google.com/document/d/14IwJYEUpar5FmiPNBFvERADiShZjsrsMpgtlntPVCX0/edit?usp=sharing

Thanks,
Daniel Oliveira


Re: apache-beam-jenkins-15 out of disk

2019-06-28 Thread Daniel Oliveira
Any updates to this issue today? It seems like this (or a similar bug) is
still happening across many Pre and Postcommits.

On Fri, Jun 28, 2019 at 12:33 AM Yifan Zou  wrote:

> I did the prune on beam15. The disk was free but all jobs fails with other
> weird problems. Looks like docker prune overkills, but I don't have
> evidence. Will look further in AM.
>
> On Thu, Jun 27, 2019 at 11:20 PM Udi Meiri  wrote:
>
>> See how the hdfs IT already avoids tag collisions.
>>
>> On Thu, Jun 27, 2019, 20:42 Yichi Zhang  wrote:
>>
>>> for flakiness I guess a tag is needed to separate concurrent build
>>> apart.
>>>
>>> On Thu, Jun 27, 2019 at 8:39 PM Yichi Zhang  wrote:
>>>
 maybe a cron job on jenkins node that does docker prune every day?

 On Thu, Jun 27, 2019 at 6:58 PM Ankur Goenka  wrote:

> This highlights the race condition caused by using single docker
> registry on a machine.
> If 2 tests create "jenkins-docker-apache.bintray.io/beam/python" one
> after another then the 2nd one will replace the 1st one and cause 
> flakyness.
>
> Is their a way to dynamically create and destroy docker repository on
> a machine and clean all the relevant data?
>
> On Thu, Jun 27, 2019 at 3:15 PM Yifan Zou  wrote:
>
>> The problem was because of the large quantity of stale docker images
>> generated by the Python portable tests and HDFS IT.
>>
>> Dumping the docker disk usage gives me:
>>
>> TYPETOTAL   ACTIVE  SIZE
>>RECLAIMABLE
>> *Images  1039356 424GB
>> 384.2GB (90%)*
>> Containers  987 2   2.042GB
>>   2.041GB (99%)
>> Local Volumes   126 0   392.8MB
>>   392.8MB (100%)
>>
>> REPOSITORY
>> TAG IMAGE IDCREATED
>> SIZESHARED SIZE UNIQUE SIZE CONTAINERS
>> jenkins-docker-apache.bintray.io/beam/python3
>>  latest  ff1b949f444222 hours ago1.639GB
>>   922.3MB  716.9MB 0
>> jenkins-docker-apache.bintray.io/beam/python
>>latest  1dda7b9d974822 hours ago1.624GB
>> 913.7MB   710.3MB 0
>> 
>>  05458187a0e322 hours 
>> ago
>>732.9MB 625.1MB107.8MB 4
>> 
>>  896f35dd685f23 hours 
>> ago
>>1.639GB 922.3MB   716.9MB 0
>> 
>>  db4d24ca9f2b23 hours 
>> ago
>>1.624GB 913.7MB  710.3MB 0
>> 
>>   547df4d71c3123 hours
>> ago732.9MB 625.1MB 107.8MB 4
>> 
>>   dd7d9582c3e023 hours
>> ago1.639GB 922.3MB 716.9MB 0
>> 
>>   664aae25523923 hours
>> ago1.624GB 913.7MB 710.3MB 0
>> 
>>   b528fedf922823 hours
>> ago732.9MB 625.1MB 107.8MB 4
>> 
>>   8e996f22435e25 hours
>> ago1.624GB 913.7MB710.3MB 0
>> hdfs_it-jenkins-beam_postcommit_python_verify_pr-818_testlatest
>>24b73b3fec0625 hours ago1.305GB
>> 965.7MB   339.5MB 0
>> 
>>   096325fb48de   25 hours 
>> ago
>>732.9MB 625.1MB107.8MB  2
>> jenkins-docker-apache.bintray.io/beam/java
>>  latest  c36d8ff2945d  25 hours ago
>> 685.6MB
>> 625.1MB   60.52MB 0
>> 
>>   11c86ebe025f26 hours
>> ago1.639GB 922.3MB  716.9MB 0
>> 
>>   2ecd69c89ec126 hours
>> ago1.624GB 913.7MB 710.3MB 0
>> hdfs_it-jenkins-beam_postcommit_python_verify-8590_testlatest
>>  3d1d589d44fe2 days ago  1.305GB
>> 965.7MB   339.5MB 0
>> hdfs_it-jenkins-beam_postcommit_python_verify_pr-801_test  latest
>>  d1cc503ebe8e2 days ago  1.305GB
>>

Re: [VOTE] Remove deprecated Java Reference Runner code from repository.

2019-05-20 Thread Daniel Oliveira
Pablo has merged the PR in and assigned a tag to the commit to make the ULR
code easy to find in the future (java-ulr-removal
<https://github.com/apache/beam/tree/java-ulr-removal>). The Java ULR is
officially removed!

On Fri, May 17, 2019 at 4:59 PM Daniel Oliveira 
wrote:

> It's been 72 hours and this vote has passed.
>
> There are 10 approving votes, 5 of which are binding:
> * Lukasz Cwik
> * Ahmet Altay
> * Pablo Estrada
> * Robert Bradshaw
> * Maximilian Michels
>
> There are no disapproving votes.
>
> With that decided, I'll get someone to merge the change on Monday (I'm
> hesitant to do a big merge right before a weekend).
>
> *From: *Michael Luckey 
> *Date: *Wed, May 15, 2019 at 5:26 AM
> *To: * 
>
> +1
>>
>> On Wed, May 15, 2019 at 2:17 PM Alex Van Boxel  wrote:
>>
>>> +1
>>>
>>> (best commits are the once that remove code :-)
>>>  _/
>>> _/ Alex Van Boxel
>>>
>>>
>>> On Wed, May 15, 2019 at 2:04 PM Manu Zhang 
>>> wrote:
>>>
>>>> +1
>>>>
>>>> On Wed, May 15, 2019 at 7:57 PM Maximilian Michels 
>>>> wrote:
>>>>
>>>>> +1
>>>>>
>>>>> On 15.05.19 13:19, Robert Bradshaw wrote:
>>>>> > +1 for removing the code given the current state of things.
>>>>> >
>>>>> > On Wed, May 15, 2019 at 12:32 AM Ruoyun Huang 
>>>>> wrote:
>>>>> >>
>>>>> >> +1
>>>>> >>
>>>>> >> From: Daniel Oliveira 
>>>>> >> Date: Tue, May 14, 2019 at 2:19 PM
>>>>> >> To: dev
>>>>> >>
>>>>> >>> Hello everyone,
>>>>> >>>
>>>>> >>> I'm calling for a vote on removing the deprecated Java Reference
>>>>> Runner code. The PR for the change has already been tested and reviewed:
>>>>> https://github.com/apache/beam/pull/8380
>>>>> >>>
>>>>> >>> [ ] +1, Approve merging the removal PR in it's current state
>>>>> >>> [ ] -1, Veto the removal PR (please provide specific comments)
>>>>> >>>
>>>>> >>> The vote will be open for at least 72 hours. Since this a vote on
>>>>> code-modification, it is adopted if there are at least 3 PMC affirmative
>>>>> votes and no vetoes.
>>>>> >>>
>>>>> >>> For those who would like context on why the Java Reference Runner
>>>>> is being deprecated, the discussions took place in the following email
>>>>> threads:
>>>>> >>>
>>>>> >>> (8 Feb. 2019) Thoughts on a reference runner to invest in? -
>>>>> Decision to deprecate the Java Reference Runner and use the Python
>>>>> FnApiRunner for those use cases instead.
>>>>> >>> (14 Mar. 2019) Python PVR Reference post-commit tests failing -
>>>>> Removal of Reference Runner Post-Commits from Jenkins, and discussion on
>>>>> removal of code.
>>>>> >>> (25 Apr. 2019) Removing Java Reference Runner code - Discussion
>>>>> thread before this formal vote.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> 
>>>>> >> Ruoyun  Huang
>>>>> >>
>>>>>
>>>>


Re: [VOTE] Remove deprecated Java Reference Runner code from repository.

2019-05-17 Thread Daniel Oliveira
It's been 72 hours and this vote has passed.

There are 10 approving votes, 5 of which are binding:
* Lukasz Cwik
* Ahmet Altay
* Pablo Estrada
* Robert Bradshaw
* Maximilian Michels

There are no disapproving votes.

With that decided, I'll get someone to merge the change on Monday (I'm
hesitant to do a big merge right before a weekend).

*From: *Michael Luckey 
*Date: *Wed, May 15, 2019 at 5:26 AM
*To: * 

+1
>
> On Wed, May 15, 2019 at 2:17 PM Alex Van Boxel  wrote:
>
>> +1
>>
>> (best commits are the once that remove code :-)
>>  _/
>> _/ Alex Van Boxel
>>
>>
>> On Wed, May 15, 2019 at 2:04 PM Manu Zhang 
>> wrote:
>>
>>> +1
>>>
>>> On Wed, May 15, 2019 at 7:57 PM Maximilian Michels 
>>> wrote:
>>>
>>>> +1
>>>>
>>>> On 15.05.19 13:19, Robert Bradshaw wrote:
>>>> > +1 for removing the code given the current state of things.
>>>> >
>>>> > On Wed, May 15, 2019 at 12:32 AM Ruoyun Huang 
>>>> wrote:
>>>> >>
>>>> >> +1
>>>> >>
>>>> >> From: Daniel Oliveira 
>>>> >> Date: Tue, May 14, 2019 at 2:19 PM
>>>> >> To: dev
>>>> >>
>>>> >>> Hello everyone,
>>>> >>>
>>>> >>> I'm calling for a vote on removing the deprecated Java Reference
>>>> Runner code. The PR for the change has already been tested and reviewed:
>>>> https://github.com/apache/beam/pull/8380
>>>> >>>
>>>> >>> [ ] +1, Approve merging the removal PR in it's current state
>>>> >>> [ ] -1, Veto the removal PR (please provide specific comments)
>>>> >>>
>>>> >>> The vote will be open for at least 72 hours. Since this a vote on
>>>> code-modification, it is adopted if there are at least 3 PMC affirmative
>>>> votes and no vetoes.
>>>> >>>
>>>> >>> For those who would like context on why the Java Reference Runner
>>>> is being deprecated, the discussions took place in the following email
>>>> threads:
>>>> >>>
>>>> >>> (8 Feb. 2019) Thoughts on a reference runner to invest in? -
>>>> Decision to deprecate the Java Reference Runner and use the Python
>>>> FnApiRunner for those use cases instead.
>>>> >>> (14 Mar. 2019) Python PVR Reference post-commit tests failing -
>>>> Removal of Reference Runner Post-Commits from Jenkins, and discussion on
>>>> removal of code.
>>>> >>> (25 Apr. 2019) Removing Java Reference Runner code - Discussion
>>>> thread before this formal vote.
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> 
>>>> >> Ruoyun  Huang
>>>> >>
>>>>
>>>


[VOTE] Remove deprecated Java Reference Runner code from repository.

2019-05-14 Thread Daniel Oliveira
Hello everyone,

I'm calling for a vote on removing the deprecated Java Reference Runner
code. The PR for the change has already been tested and reviewed:
https://github.com/apache/beam/pull/8380

[ ] +1, Approve merging the removal PR in it's current state
[ ] -1, Veto the removal PR (please provide specific comments)

The vote will be open for at least 72 hours. Since this a vote on
code-modification, it is adopted if there are at least 3 PMC affirmative
votes and no vetoes.

For those who would like context on why the Java Reference Runner is being
deprecated, the discussions took place in the following email threads:

   1. (8 Feb. 2019) Thoughts on a reference runner to invest in?
   

-
   Decision to deprecate the Java Reference Runner and use the Python
   FnApiRunner for those use cases instead.
   2. (14 Mar. 2019) Python PVR Reference post-commit tests failing
   

   - Removal of Reference Runner Post-Commits from Jenkins, and discussion on
   removal of code.
   3. (25 Apr. 2019) Removing Java Reference Runner code
   

   - Discussion thread before this formal vote.


Re: Removing Java Reference Runner code

2019-04-30 Thread Daniel Oliveira
It sounds like no one has any objections specifically to removing this
code. I'll get someone to review the PR and I'll start a vote to merge it
as soon as it's approved.

On Mon, Apr 29, 2019 at 3:39 AM Robert Bradshaw  wrote:

> I'd imagine that most users will continue to debug their pipelines
> using a direct runner, and even if the portable runner is used it can
> be run in "loopback" mode where the pipeline-submitting process also
> acts as the worker(s), so one can output print statements, set
> breakpoints, etc. as if it were all in-process (unless there's
> actually something strange with the runner <-> SDK API itself).
>
> Similarly, for development, many (most) features (IO, SQL, schemas)
> are runner-agnostic, though of course this is not always the case
> especially if there are fundamental changes to the model (e.g. one
> that comes to mind is retractions).
>
> That's not to say there isn't also value in testing your code on a
> portable runner that will more faithfully represent production
> environments, but at this level of integration test (e.g. using docker
> and all) I don't think having Python is that high of a barrier.
>
> As for a gradle command to run JVR tests on the Python ULR, I don't
> think that's currently available, but it should be.
>
>
>
> On Sat, Apr 27, 2019 at 4:53 AM Daniel Oliveira 
> wrote:
> >
> > Hey Boyuan,
> >
> > I think that's a good question. Mikhail's mostly right, that the user
> shouldn't need to know how the Python ULR works for their debugging. This
> is actually more of an issue with portability itself anyway. Even when I
> was coding Java pipelines on the Java ULR, if something went wrong in the
> runner it was still really difficult to debug. Hopefully the only people
> that will need to do that painful exercise are Beam devs doing development
> work on the runners. If an average user is having a problem, the runner's
> logs and error messages should be effective enough that the user shouldn't
> care what language the runner is using or how it's implemented.
> >
> > On Fri, Apr 26, 2019 at 12:36 PM Boyuan Zhang 
> wrote:
> >>
> >> Another concern from me is, will it be difficult for a Java person (who
> developing Java SDK) to figure out what's going on in Python ULR when
> debugging?
> >>
> >> On Fri, Apr 26, 2019 at 12:05 PM Kenneth Knowles 
> wrote:
> >>>
> >>> Good points. Distilling one single item: can I, today, run the Java
> SDK's suite of ValidatesRunner command against the Python ULR + Java SDK
> Harness, in a single Gradle command?
> >>>
> >>> Kenn
> >>>
> >>> On Fri, Apr 26, 2019 at 9:54 AM Anton Kedin  wrote:
> >>>>
> >>>> If there is no plans to invest in ULR then it makes sense to remove
> it.
> >>>>
> >>>> Going forward, however, I think we should try to document the higher
> level approach we're taking with runners (and portability) now that we have
> something working and can reflect on it. For example, couple of things that
> are not 100% clear to me:
> >>>>  - if the focus is on python runner for portability efforts, how does
> java SDK (and other languages) tie into this? E.g. how do we run, test,
> measure, and develop things (pipelines, aspects of the SDK, runner);
> >>>>  - what's our approach to developing new features, should we make
> sure python runner supports them as early as possible (e.g. schemas and
> SQL)?
> >>>>  - java DirectRunner is still there:
> >>>> - it is still the primary tool for java SDK development purposes,
> and as Kenn mentioned in the linked threads it adds value by making sure
> users don't rely on implementation details of specific runners. Do we have
> a similar story for portable scenarios?
> >>>> - I assume that extra validations in the DirectRunner have impact
> on performance in various ways (potentially non-deterministic). While this
> doesn't matter in some cases, it might do in others. Having a local runner
> that is (better) optimized for execution would probably make more sense for
> perf measurements, integration tests, and maybe even local production jobs.
> Is this something potentially worth looking into?
> >>>>
> >>>> Regards,
> >>>> Anton
> >>>>
> >>>>
> >>>> On Fri, Apr 26, 2019 at 4:41 AM Maximilian Michels 
> wrote:
> >>>>>
> >>>>> Thanks for following up with this.

Re: Removing Java Reference Runner code

2019-04-26 Thread Daniel Oliveira
Hey Boyuan,

I think that's a good question. Mikhail's mostly right, that the user
shouldn't need to know how the Python ULR works for their debugging. This
is actually more of an issue with portability itself anyway. Even when I
was coding Java pipelines on the Java ULR, if something went wrong in the
runner it was still really difficult to debug. Hopefully the only people
that will need to do that painful exercise are Beam devs doing development
work on the runners. If an average user is having a problem, the runner's
logs and error messages should be effective enough that the user shouldn't
care what language the runner is using or how it's implemented.

On Fri, Apr 26, 2019 at 12:36 PM Boyuan Zhang  wrote:

> Another concern from me is, will it be difficult for a Java person (who
> developing Java SDK) to figure out what's going on in Python ULR when
> debugging?
>
> On Fri, Apr 26, 2019 at 12:05 PM Kenneth Knowles  wrote:
>
>> Good points. Distilling one single item: can I, today, run the Java SDK's
>> suite of ValidatesRunner command against the Python ULR + Java SDK Harness,
>> in a single Gradle command?
>>
>> Kenn
>>
>> On Fri, Apr 26, 2019 at 9:54 AM Anton Kedin  wrote:
>>
>>> If there is no plans to invest in ULR then it makes sense to remove it.
>>>
>>> Going forward, however, I think we should try to document the higher
>>> level approach we're taking with runners (and portability) now that we have
>>> something working and can reflect on it. For example, couple of things that
>>> are not 100% clear to me:
>>>  - if the focus is on python runner for portability efforts, how does
>>> java SDK (and other languages) tie into this? E.g. how do we run, test,
>>> measure, and develop things (pipelines, aspects of the SDK, runner);
>>>  - what's our approach to developing new features, should we make sure
>>> python runner supports them as early as possible (e.g. schemas and SQL)?
>>>  - java DirectRunner is still there:
>>> - it is still the primary tool for java SDK development purposes,
>>> and as Kenn mentioned in the linked threads it adds value by making sure
>>> users don't rely on implementation details of specific runners. Do we have
>>> a similar story for portable scenarios?
>>> - I assume that extra validations in the DirectRunner have impact on
>>> performance in various ways (potentially non-deterministic). While this
>>> doesn't matter in some cases, it might do in others. Having a local runner
>>> that is (better) optimized for execution would probably make more sense for
>>> perf measurements, integration tests, and maybe even local production jobs.
>>> Is this something potentially worth looking into?
>>>
>>> Regards,
>>> Anton
>>>
>>>
>>> On Fri, Apr 26, 2019 at 4:41 AM Maximilian Michels 
>>> wrote:
>>>
>>>> Thanks for following up with this. I have mixed feelings to see the
>>>> portable Java DirectRunner go, but I'm in favor of this change because
>>>> it removes a lot of code that we do not really make use of.
>>>>
>>>> -Max
>>>>
>>>> On 26.04.19 02:58, Kenneth Knowles wrote:
>>>> > Thanks for providing all this background on the PR. It is very easy
>>>> to
>>>> > see where it came from. Definitely nice to have less code and fewer
>>>> > things that can break. Perhaps lazy consensus is enough.
>>>> >
>>>> > Kenn
>>>> >
>>>> > On Thu, Apr 25, 2019 at 4:01 PM Daniel Oliveira <
>>>> danolive...@google.com
>>>> > <mailto:danolive...@google.com>> wrote:
>>>> >
>>>> > Hey everyone,
>>>> >
>>>> > I made a preliminary PR for removing all the Java Reference Runner
>>>> > code (PR-8380 <https://github.com/apache/beam/pull/8380>) since I
>>>> > wanted to see if it could be done easily. It seems to be working
>>>> > fine, so I wanted to open up this discussion to make sure people
>>>> are
>>>> > still in agreement on getting rid of this code and that people
>>>> don't
>>>> > have any concerns.
>>>> >
>>>> > For those who need additional context about this, this previous
>>>> > thread
>>>> > <
>>>> https://lists.apache.org/thread.html/b235f8ee55a737ea399756edd80b1218ed34d3439f7b0ed59bfa8e40@%3Cdev.beam.apache.org%3E
>>>> >
>>>> > is where we discussed deprecating the Java Reference Runner (in
>>>> some
>>>> > places it's called the ULR or Universal Local Runner, but it's the
>>>> > same thing). Then there's this thread
>>>> > <
>>>> https://lists.apache.org/thread.html/0b68efce9b7f2c5297b32d09e5d903e9b354199fe2ce446fbcd240bc@%3Cdev.beam.apache.org%3E
>>>> >
>>>> > where we discussed removing the code from the repo since it's been
>>>> > deprecated.
>>>> >
>>>> > If no one has any objections to trying to remove the code I'll
>>>> have
>>>> > someone review the PR I wrote and start a vote to have it merged.
>>>> >
>>>> > Thanks,
>>>> > Daniel Oliveira
>>>> >
>>>>
>>>


Re: Removing Java Reference Runner code

2019-04-26 Thread Daniel Oliveira
Hey Kenn,

I'm not 100% sure. Robert (+Robert Bradshaw ) could
answer your question accurately. Last I checked (about 2 months ago) there
was no such target, but I don't think there's anything preventing one from
being written.

On Fri, Apr 26, 2019 at 12:05 PM Kenneth Knowles  wrote:

> Good points. Distilling one single item: can I, today, run the Java SDK's
> suite of ValidatesRunner command against the Python ULR + Java SDK Harness,
> in a single Gradle command?
>
> Kenn
>
> On Fri, Apr 26, 2019 at 9:54 AM Anton Kedin  wrote:
>
>> If there is no plans to invest in ULR then it makes sense to remove it.
>>
>> Going forward, however, I think we should try to document the higher
>> level approach we're taking with runners (and portability) now that we have
>> something working and can reflect on it. For example, couple of things that
>> are not 100% clear to me:
>>  - if the focus is on python runner for portability efforts, how does
>> java SDK (and other languages) tie into this? E.g. how do we run, test,
>> measure, and develop things (pipelines, aspects of the SDK, runner);
>>  - what's our approach to developing new features, should we make sure
>> python runner supports them as early as possible (e.g. schemas and SQL)?
>>  - java DirectRunner is still there:
>> - it is still the primary tool for java SDK development purposes, and
>> as Kenn mentioned in the linked threads it adds value by making sure users
>> don't rely on implementation details of specific runners. Do we have a
>> similar story for portable scenarios?
>> - I assume that extra validations in the DirectRunner have impact on
>> performance in various ways (potentially non-deterministic). While this
>> doesn't matter in some cases, it might do in others. Having a local runner
>> that is (better) optimized for execution would probably make more sense for
>> perf measurements, integration tests, and maybe even local production jobs.
>> Is this something potentially worth looking into?
>>
>> Regards,
>> Anton
>>
>>
>> On Fri, Apr 26, 2019 at 4:41 AM Maximilian Michels 
>> wrote:
>>
>>> Thanks for following up with this. I have mixed feelings to see the
>>> portable Java DirectRunner go, but I'm in favor of this change because
>>> it removes a lot of code that we do not really make use of.
>>>
>>> -Max
>>>
>>> On 26.04.19 02:58, Kenneth Knowles wrote:
>>> > Thanks for providing all this background on the PR. It is very easy to
>>> > see where it came from. Definitely nice to have less code and fewer
>>> > things that can break. Perhaps lazy consensus is enough.
>>> >
>>> > Kenn
>>> >
>>> > On Thu, Apr 25, 2019 at 4:01 PM Daniel Oliveira <
>>> danolive...@google.com
>>> > <mailto:danolive...@google.com>> wrote:
>>> >
>>> > Hey everyone,
>>> >
>>> > I made a preliminary PR for removing all the Java Reference Runner
>>> > code (PR-8380 <https://github.com/apache/beam/pull/8380>) since I
>>> > wanted to see if it could be done easily. It seems to be working
>>> > fine, so I wanted to open up this discussion to make sure people
>>> are
>>> > still in agreement on getting rid of this code and that people
>>> don't
>>> > have any concerns.
>>> >
>>> > For those who need additional context about this, this previous
>>> > thread
>>> > <
>>> https://lists.apache.org/thread.html/b235f8ee55a737ea399756edd80b1218ed34d3439f7b0ed59bfa8e40@%3Cdev.beam.apache.org%3E
>>> >
>>> >     is where we discussed deprecating the Java Reference Runner (in
>>> some
>>> > places it's called the ULR or Universal Local Runner, but it's the
>>> > same thing). Then there's this thread
>>> > <
>>> https://lists.apache.org/thread.html/0b68efce9b7f2c5297b32d09e5d903e9b354199fe2ce446fbcd240bc@%3Cdev.beam.apache.org%3E
>>> >
>>> > where we discussed removing the code from the repo since it's been
>>> > deprecated.
>>> >
>>> > If no one has any objections to trying to remove the code I'll have
>>> > someone review the PR I wrote and start a vote to have it merged.
>>> >
>>> > Thanks,
>>> > Daniel Oliveira
>>> >
>>>
>>


Re: Removing Java Reference Runner code

2019-04-26 Thread Daniel Oliveira
s, it might do in others. Having a local runner
> that is (better) optimized for execution would probably make more sense for
> perf measurements, integration tests, and maybe even local production jobs.
> Is this something potentially worth looking into?
>
> Regards,
> Anton
>
>
> On Fri, Apr 26, 2019 at 4:41 AM Maximilian Michels  wrote:
>
>> Thanks for following up with this. I have mixed feelings to see the
>> portable Java DirectRunner go, but I'm in favor of this change because
>> it removes a lot of code that we do not really make use of.
>>
>> -Max
>>
>> On 26.04.19 02:58, Kenneth Knowles wrote:
>> > Thanks for providing all this background on the PR. It is very easy to
>> > see where it came from. Definitely nice to have less code and fewer
>> > things that can break. Perhaps lazy consensus is enough.
>> >
>> > Kenn
>> >
>> > On Thu, Apr 25, 2019 at 4:01 PM Daniel Oliveira > > <mailto:danolive...@google.com>> wrote:
>> >
>> > Hey everyone,
>> >
>> > I made a preliminary PR for removing all the Java Reference Runner
>> > code (PR-8380 <https://github.com/apache/beam/pull/8380>) since I
>> > wanted to see if it could be done easily. It seems to be working
>> > fine, so I wanted to open up this discussion to make sure people are
>> > still in agreement on getting rid of this code and that people don't
>> > have any concerns.
>> >
>> > For those who need additional context about this, this previous
>> > thread
>> > <
>> https://lists.apache.org/thread.html/b235f8ee55a737ea399756edd80b1218ed34d3439f7b0ed59bfa8e40@%3Cdev.beam.apache.org%3E
>> >
>> > is where we discussed deprecating the Java Reference Runner (in some
>> > places it's called the ULR or Universal Local Runner, but it's the
>> > same thing). Then there's this thread
>> > <
>> https://lists.apache.org/thread.html/0b68efce9b7f2c5297b32d09e5d903e9b354199fe2ce446fbcd240bc@%3Cdev.beam.apache.org%3E
>> >
>> > where we discussed removing the code from the repo since it's been
>> > deprecated.
>> >
>> > If no one has any objections to trying to remove the code I'll have
>> > someone review the PR I wrote and start a vote to have it merged.
>> >
>> > Thanks,
>> > Daniel Oliveira
>> >
>>
>


Removing Java Reference Runner code

2019-04-25 Thread Daniel Oliveira
Hey everyone,

I made a preliminary PR for removing all the Java Reference Runner code (
PR-8380 <https://github.com/apache/beam/pull/8380>) since I wanted to see
if it could be done easily. It seems to be working fine, so I wanted to
open up this discussion to make sure people are still in agreement on
getting rid of this code and that people don't have any concerns.

For those who need additional context about this, this previous thread
<https://lists.apache.org/thread.html/b235f8ee55a737ea399756edd80b1218ed34d3439f7b0ed59bfa8e40@%3Cdev.beam.apache.org%3E>
is where we discussed deprecating the Java Reference Runner (in some places
it's called the ULR or Universal Local Runner, but it's the same thing).
Then there's this thread
<https://lists.apache.org/thread.html/0b68efce9b7f2c5297b32d09e5d903e9b354199fe2ce446fbcd240bc@%3Cdev.beam.apache.org%3E>
where we discussed removing the code from the repo since it's been
deprecated.

If no one has any objections to trying to remove the code I'll have someone
review the PR I wrote and start a vote to have it merged.

Thanks,
Daniel Oliveira


Re: [ANNOUNCE] New committer announcement: Boyuan Zhang

2019-04-10 Thread Daniel Oliveira
Congrats Boyuan!

On Wed, Apr 10, 2019 at 10:20 AM Rui Wang  wrote:

> So well deserved!
>
> -Rui
>
> On Wed, Apr 10, 2019 at 10:12 AM Pablo Estrada  wrote:
>
>> Well deserved : ) congrats Boyuan!
>>
>> On Wed, Apr 10, 2019 at 10:08 AM Aizhamal Nurmamat kyzy <
>> aizha...@google.com> wrote:
>>
>>> Congratulations Boyuan!
>>>
>>> On Wed, Apr 10, 2019 at 9:52 AM Ruoyun Huang  wrote:
>>>
 Thanks for your contributions and congratulations Boyuan!

 On Wed, Apr 10, 2019 at 9:00 AM Kenneth Knowles 
 wrote:

> Hi all,
>
> Please join me and the rest of the Beam PMC in welcoming a new
> committer: Boyuan Zhang.
>
> Boyuan has been contributing to Beam since early 2018. She has
> proposed 100+ pull requests across a wide range of topics: bug fixes, to
> integration tests, build improvements, metrics features, release
> automation. Two big picture things to highlight are building/releasing 
> Beam
> Python wheels and managing the donation of the Beam Dataflow Java Worker,
> including help with I.P. clearance.
>
> In consideration of Boyuan's contributions, the Beam PMC trusts Boyuan
> with the responsibilities of a Beam committer [1].
>
> Thank you, Boyuan, for your contributions.
>
> Kenn
>
> [1] https://beam.apache.org/contribute/become-a-committer/#an-apache-
> beam-committer
>


 --
 
 Ruoyun  Huang




Re: SNAPSHOTS have not been updated since february

2019-03-26 Thread Daniel Oliveira
I made a bug for this specific issue (artifacts not publishing to the
Apache Maven repo): https://issues.apache.org/jira/browse/BEAM-6919

While I was gathering info for the bug report I also noticed +Yifan Zou
 has an experimental PR testing a fix:
https://github.com/apache/beam/pull/8148

On Tue, Mar 26, 2019 at 11:42 AM Boyuan Zhang  wrote:

> +Daniel Oliveira 
>
> On Tue, Mar 26, 2019 at 9:57 AM Boyuan Zhang  wrote:
>
>> Sorry for the typo. Ideally, the snapshot publish is *independent* from
>> postrelease_snapshot.
>>
>> On Tue, Mar 26, 2019 at 9:55 AM Boyuan Zhang  wrote:
>>
>>> Hey,
>>>
>>> I'm trying to publish the artifacts by commenting "Run Gradle Publish"
>>> in my PR, but there are several errors saying "cannot write artifacts
>>> into dir"
>>> <https://scans.gradle.com/s/g4uwxrj5gsizo/console-log?task=:beam-examples-java:publishMavenJavaPublicationToMavenRepository>,
>>> anyone has idea on it? Ideally, the snapshot publish is dependent from
>>> postrelease_snapshot. The publish task is to build and publish artifacts
>>> and the postrelease_snapshot is to verify whether the snapshot works.
>>>
>>> On Tue, Mar 26, 2019 at 8:45 AM Ahmet Altay  wrote:
>>>
>>>> I believe this is related to
>>>> https://issues.apache.org/jira/browse/BEAM-6840 and +Boyuan Zhang
>>>>  has a fix in progress
>>>> https://github.com/apache/beam/pull/8132
>>>>
>>>> On Tue, Mar 26, 2019 at 7:09 AM Ismaël Mejía  wrote:
>>>>
>>>>> I was trying to validate a fix on the Spark runner and realized that
>>>>> Beam SNAPSHOTS have not been updated since February 24 !
>>>>>
>>>>>
>>>>> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.12.0-SNAPSHOT/
>>>>>
>>>>> Can somebody please take a look at why this is not been updated?
>>>>>
>>>>> Thanks,
>>>>> Ismaël
>>>>>
>>>>


Re: Python PVR Reference post-commit tests failing

2019-03-15 Thread Daniel Oliveira
The ULR used a bunch of code forked from the DirectRunner but I don't think
it currently shares anything. And if it does share any code that I don't
know about I expect that the dependency is one-way, i.e. removing the ULR
shouldn't affect the DirectRunner. The only shared code I know of is
between the ULR and other portable runners, particularly Flink, but I don't
think that would be difficult to isolate.

I'm in support of disabling the ULR tests and ok with removing the ULR as
long as we make sure it can be revived if we want, like with Mikhail's
suggestion of tagging the commit. I can help with the removal of the ULR
code since I know specifics about the codebase.

On Thu, Mar 14, 2019 at 2:25 PM Kenneth Knowles  wrote:

> I know the Java DirectRunner shares a lot of code with the ULR. I'm a bit
> unclear on the delta and how independent they are.
>
> Kenn
>
> On Thu, Mar 14, 2019 at 2:10 PM Mikhail Gryzykhin 
> wrote:
>
>> @Kenneth
>> If we disable tests, I'd call Java ULR a dead code.
>>
>> One of the better compromises:
>> 1. disable tests.
>> 2. Add tag to the last commit where Java ULR existed.
>> 3. Remove Java ULR from head.
>>
>> Keeping history, no extra dead code at head.
>>
>> --Mikhail
>>
>> Have feedback ?
>>
>>
>> On Thu, Mar 14, 2019 at 1:02 PM Ankur Goenka  wrote:
>>
>>> On that note, we should also think about adding PVR for python reference
>>> runners. Jira: https://issues.apache.org/jira/browse/BEAM-6837
>>>
>>>
>>> On Thu, Mar 14, 2019 at 12:57 PM Kenneth Knowles 
>>> wrote:
>>>
 How about this compromise:

 1. disable the test since clearly no one is relying on the
 functionality that is broken
 2. leave the Java ULR as-is for now, and a volunteer can pick it up and
 make it work if they want

 Kenn

 On Thu, Mar 14, 2019 at 11:41 AM Mikhail Gryzykhin 
 wrote:

> Hi everyone,
>
> We have Python PVR Reference post-commit tests failing for quite some
> time now. These are tests for java reference runner.
>
> According to this thread
> ,
> we are deciding what to do with java reference runner and might want to
> remove it from code base.
>
> My question is: do we want to a) invest time in fixing python PVR
> tests, or b) disable this test and start cleaning up code?
>
> a) Is worth it if we want to invest into java reference runner in the
> future.
> b) Is worth if we want to invest into Python and forfeit java
> reference runner.
>
> Option b) seem more reasonable to me atm, since most people lean
> towards going forward with Python reference runner.
>
> Please, share your thoughts.
>
> Regards,
> --Mikhail
>
> Have feedback ?
>



Re: New Contributor

2019-03-05 Thread Daniel Oliveira
Welcome to Beam Boris!

On Tue, Mar 5, 2019 at 2:03 PM Mikhail Gryzykhin  wrote:

> Welcome to the community!
>
> --Mikhail
>
> Have feedback ?
>
>
> On Tue, Mar 5, 2019 at 1:53 PM Ruoyun Huang  wrote:
>
>> Welcome Boris!
>>
>> On Tue, Mar 5, 2019 at 1:34 PM Ahmet Altay  wrote:
>>
>>> Welcome Boris!
>>>
>>> On Mon, Mar 4, 2019 at 5:40 PM Ismaël Mejía  wrote:
>>>
 Done, welcome!

 On Tue, Mar 5, 2019 at 1:25 AM Boris Shkolnik 
 wrote:
 >
 >
 > Hi,
 >
 > My name is Boris Shkolnik. I am a committer in Hadoop and Samza
 Apache projects.
 > I would like to contribute to beam.
 > Could you please add me to the beam project.
 >
 > My user name is boryas @apache.org
 >
 > Thanks,
 > -Boris.

>>>
>>
>> --
>> 
>> Ruoyun  Huang
>>
>>


Re: Added a Jira beginner's guide to the wiki.

2019-03-01 Thread Daniel Oliveira
That's really useful Udi. I'm not sure if it would fit the beginner's guide
but it's definitely worth writing down, so I made a "Jira Tips" page and
added it there:
https://cwiki.apache.org/confluence/display/BEAM/Jira+Tips

On Wed, Feb 27, 2019 at 7:40 PM Kenneth Knowles  wrote:

> Genius. I love it. This will save me so much clicking time.
>
> Kenn
>
> On Wed, Feb 27, 2019 at 5:20 PM Udi Meiri  wrote:
>
>> My favorite way to navigate JIRA is using a Chrome search engine.
>> You configure it like this:
>> [image: Screenshot from 2019-02-27 17-11-26.png]
>> (URL is:
>> https://issues.apache.org/jira/secure/QuickSearch.jspa?searchString=%s)
>>
>> And search by writing in the location bar:
>> "j BEAM-1234" will take you to that specific issue
>> "j beam unresolved udim" will show all unresolved issues assigned to udim
>>
>>
>> On Tue, Feb 26, 2019 at 9:22 PM Ahmet Altay  wrote:
>>
>>> Thank you Daniel, this is great information.
>>>
>>> On Fri, Feb 22, 2019 at 11:47 AM Daniel Oliveira 
>>> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> In a recent thread in this list I mentioned that it might be nice to
>>>> have a short guide for our Jira on the wiki since there were some aspects
>>>> of Jira that I found a bit unintuitive or not discover-able when I was
>>>> getting into the project. I went ahead and wrote one up and would
>>>> appreciate some feedback, especially from any contributors that may be new
>>>> to Beam and/or Jira.
>>>>
>>>>
>>>> https://cwiki.apache.org/confluence/display/BEAM/Beam+Jira+Beginner%27s+Guide
>>>>
>>>> The main two aspects that I want to make sure I got right are:
>>>>
>>>> 1. Covering details that are often confusing for new contributors, such
>>>> as ways Beam uses Jira that might be unique, or just unintuitive features.
>>>>
>>>> 2. Keeping it very brief and duplicating as little documentation as
>>>> possible. I don't want this to get outdated, so I'd much rather link to a
>>>> source of truth when possible.
>>>>
>>>> If anyone has any details I missed that they'd like to add, or feel
>>>> that they could edit the guide a bit to keep it brief and cut out
>>>> unnecessary info, please go ahead. Also, I'm hoping that this guide could
>>>> be linked from the Contribution Guide
>>>> <https://beam.apache.org/contribute/> on the website if people find it
>>>> useful, so feedback on that front would be great too.
>>>>
>>>


Re: [ANNOUNCE] New committer announcement: Michael Luckey

2019-02-28 Thread Daniel Oliveira
Congrats Michael!

On Thu, Feb 28, 2019 at 3:12 AM Maximilian Michels  wrote:

> Welcome, it's great to have you onboard Michael!
>
> On 28.02.19 11:46, Michael Luckey wrote:
> > Thanks to all of you for the warm welcome. Really happy to be part of
> > this great community!
> >
> > michel
> >
> > On Thu, Feb 28, 2019 at 8:39 AM David Morávek  > > wrote:
> >
> > Congrats Michael! 🍾
> >
> > D.
> >
> >  > On 28 Feb 2019, at 03:27, Ismaël Mejía  > > wrote:
> >  >
> >  > Congratulations Michael, and thanks for all the contributions!
> >  >
> >  >> On Wed, Feb 27, 2019 at 6:30 PM Ankur Goenka  > > wrote:
> >  >>
> >  >> Congratulations Michael!
> >  >>
> >  >>> On Wed, Feb 27, 2019 at 2:25 PM Thomas Weise
> > mailto:thomas.we...@gmail.com>> wrote:
> >  >>>
> >  >>> Congrats Michael!
> >  >>>
> >  >>>
> >   On Wed, Feb 27, 2019 at 12:41 PM Gleb Kanterov
> > mailto:g...@spotify.com>> wrote:
> >  
> >   Congratulations and welcome!
> >  
> >  > On Wed, Feb 27, 2019 at 8:57 PM Connell O'Callaghan
> > mailto:conne...@google.com>> wrote:
> >  >
> >  > Excellent thank you for sharing Kenn!!!
> >  >
> >  > Michael congratulations for this recognition of your
> > contributions to advancing BEAM
> >  >
> >  >> On Wed, Feb 27, 2019 at 11:52 AM Kenneth Knowles
> > mailto:k...@apache.org>> wrote:
> >  >>
> >  >> Hi all,
> >  >>
> >  >> Please join me and the rest of the Beam PMC in welcoming a
> > new committer: Michael Luckey
> >  >>
> >  >> Michael has been contributing to Beam since early 2017. He
> > has fixed many build and developer environment issues, noted and
> > root-caused breakages on master, generously reviewed many others'
> > changes to the build. In consideration of Michael's contributions,
> > the Beam PMC trusts Michael with the responsibilities of a Beam
> > committer [1].
> >  >>
> >  >> Thank you, Michael, for your contributions.
> >  >>
> >  >> Kenn
> >  >>
> >  >> [1]
> >
> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
> >  
> >  
> >  
> >   --
> >   Cheers,
> >   Gleb
> >
>


Added a Jira beginner's guide to the wiki.

2019-02-22 Thread Daniel Oliveira
Hi everyone,

In a recent thread in this list I mentioned that it might be nice to have a
short guide for our Jira on the wiki since there were some aspects of Jira
that I found a bit unintuitive or not discover-able when I was getting into
the project. I went ahead and wrote one up and would appreciate some
feedback, especially from any contributors that may be new to Beam and/or
Jira.

https://cwiki.apache.org/confluence/display/BEAM/Beam+Jira+Beginner%27s+Guide

The main two aspects that I want to make sure I got right are:

1. Covering details that are often confusing for new contributors, such as
ways Beam uses Jira that might be unique, or just unintuitive features.

2. Keeping it very brief and duplicating as little documentation as
possible. I don't want this to get outdated, so I'd much rather link to a
source of truth when possible.

If anyone has any details I missed that they'd like to add, or feel that
they could edit the guide a bit to keep it brief and cut out unnecessary
info, please go ahead. Also, I'm hoping that this guide could be linked
from the Contribution Guide  on the
website if people find it useful, so feedback on that front would be great
too.


Re: Thoughts on a reference runner to invest in?

2019-02-12 Thread Daniel Oliveira
hink that it is also
>>> > the case, as suggested, that a distributed runner's use of this shared
>>> > library is a better reference point (for other distributed runners)
>>> than
>>> > one using the direct runner (e.g. there is a much more obvious
>>> > delineation between the runner's responsibility and Beam code than in
>>> > the direct runner where the boundaries between orchestration,
>>> execution,
>>> > and other concerns are not as clear).
>>> >
>>> > As well as serving as a reference to runner implementers, the
>>> reference
>>> > runner can also be useful for prototyping (here I think Python holds
>>> an
>>> > advantage, but we're getting into subjective areas now), documenting
>>> (or
>>> > ideally augmenting the documentation of) the spec (here I'd say a
>>> > smaller advantage to Python, but neither runner clean,
>>> straightforward,
>>> > and documented enough to serve this purpose well yet), and serving as
>>> a
>>> > lightweight universal local runner against which to develop (and,
>>> > possibly use long term in place of a direct runner) new SDKs (here
>>> > you'll get a wide variety of answers whether Python or Java is easier
>>> to
>>> > take on as a dependency for a third language, or we could just package
>>> > it up in a docker image and take docker as a dependency).
>>> >
>>> > Another more pragmatic note is that one thing that helped both the
>>> Flink
>>> > and FnApiRunner forwards is that they were driven forward by actual
>>> > usecases--Lyft has actual Python (necessitating portable) pipelines
>>> they
>>> > want to run on Flink, and the FnApiRunner is the direct runner for
>>> > Python. The Java ULR (at least where it is now) sits in an awkward
>>> place
>>> > where its only role is to be a reference rather than be used, which
>>> (in
>>> > a world of limited resources) makes it harder to justify investment.
>>> >
>>> > - Robert
>>> >
>>> >
>>> >
>>> > On Tue, Feb 12, 2019 at 3:53 AM Kenneth Knowles >> > <mailto:k...@apache.org>> wrote:
>>> >
>>> > Interesting silence here. You've got it right that the reason we
>>> > initially chose Java was because of the cross-runner sharing. The
>>> > reference runner could be the first target runner for any new
>>> > feature and then its work could be directly (or indirectly via
>>> > copy/paste/modify if it works better) be used in other runners.
>>> > Examples:
>>> >
>>> >   - The implementations of (pre-portability) state & timers in
>>> > runners/core-java and prototyped in the Java DirectRunner made it a
>>> > matter of a couple of days to implement on other runners, and they
>>> > saw pretty quick adoption.
>>> >   - Probably the same could be said for the first drafts of the
>>> > runners, which re-used a bunch of runners/core-java and had each
>>> > others' translation code as a reference.
>>> >
>>> > I'm interested if anyone would be willing to confirm if it is
>>> > because the FlinkRunner has forged ahead and the Dataflow worker is
>>> > open source. It makes sense that the code from a distributed runner
>>> > is an even better reference point if you are building another
>>> > distributed runner. From the look of it, the SamzaRunner had no
>>> > trouble getting started on portability.
>>> >
>>> > Kenn
>>> >
>>> > On Mon, Feb 11, 2019 at 6:04 PM Daniel Oliveira
>>> > mailto:danolive...@google.com>> wrote:
>>> >
>>> > Yeah, the FnApiRunner is what I'm leaning towards too. I wasn't
>>> > sure how much demand there was for an actual reference
>>> > implementation in Java though, so I was hoping there were
>>> runner
>>> > authors that would want to chime in.
>>> >
>>> > On the other hand, the Flink runner could serve as a reference
>>> > implementation for portable features since it's further along,
>>> > so maybe it's not an issue regardless.
>>> >
>>> > On Mon,

Re: Thoughts on a reference runner to invest in?

2019-02-11 Thread Daniel Oliveira
Yeah, the FnApiRunner is what I'm leaning towards too. I wasn't sure how
much demand there was for an actual reference implementation in Java
though, so I was hoping there were runner authors that would want to chime
in.

On the other hand, the Flink runner could serve as a reference
implementation for portable features since it's further along, so maybe
it's not an issue regardless.

On Mon, Feb 11, 2019 at 1:09 PM Sam Rohde  wrote:

> Thanks for starting this thread. If I had to guess, I would say there is
> more of a demand for Python as it's more widely used for data scientists/
> analytics. Being pragmatic, the FnApiRunner already has more feature work
> than the Java so we should go with that.
>
> -Sam
>
> On Fri, Feb 8, 2019 at 10:07 AM Daniel Oliveira 
> wrote:
>
>> Hello Beam dev community,
>>
>> For those who don't know me, I work for Google and I've been working on
>> the Java reference runner, which is a portable, local Java runner (it's
>> basically the direct runner with the portability APIs implemented). Our
>> goal in working on this was to have a portable runner which ran locally so
>> it could be used by users for testing portable pipelines, devs for testing
>> new features with portability, and for runner authors to provide a simple
>> reference implementation of a portable runner.
>>
>> Due to various circumstances though, progress on the Java reference
>> runner has been pretty slow, and a Python runner which does pretty much the
>> same things was made to aid portability development in Python (called the
>> FnApiRunner). This runner is currently further along in feature work than
>> the Java reference runner, so we've been reevaluating if we should switch
>> to investing in it instead.
>>
>> My question to the community is: Which runner do you think would be more
>> valuable to the dev community and Beam users? For those of you who are
>> runner authors, do you have a preference for what language you'd like to
>> see a reference implementation in?
>>
>> Thanks,
>> Daniel Oliveira
>>
>


Re: JIRA priorities explaination

2019-02-11 Thread Daniel Oliveira
Ah, sorry, I missed that Alex was just quoting from our Jira installation
(didn't read his email closely enough). Also I wasn't aware about those
pages on our website.

Seeing as we do have definitions for our priorities, I guess my main
request would be that they be made more discoverable somehow. I don't think
the tooltips are reliable, and the pages on the website are informative,
but hard to find. Since it feels a bit lazy to say "this isn't discoverable
enough" without suggesting any improvements, I'd like to propose these two
changes:

1. We should write a Beam Jira Guide with basic information about our Jira.
I think the bug priorities should go in here, but also anything else we
would want someone to know before filing any Jira issues, like how our
components are organized or what the different issue types mean. This guide
could either be written in the website or the wiki, but I think it should
definitely be linked in https://beam.apache.org/contribute/ so that
newcomers read it before getting their Jira account approved. The goal here
being to have a reference for the basics of our Jira since at the moment it
doesn't seem like we have anything for this.

2. The existing info on Post-commit and pre-commit policies doesn't seem
very discoverable to someone monitoring the Pre/Post-commits. I've reported
a handful of test-failures already and haven't seen this link mentioned
much. We should try to find a way to funnel people towards this link when
there's an issue, the same way we try to funnel people towards the
contribution guide when they write a PR. As a note, while writing this
email I remembered this link that someone gave me before (
https://s.apache.org/beam-test-failure
<https://www.google.com/url?q=https://s.apache.org/beam-test-failure&sa=D&usg=AFQjCNH0ZmcPNrKiYDDcajVZuCnC_qfxDw>).
That mentions the Post-commit policies page, so maybe it's just a matter of
pasting that all over our Jenkins builds whenever we have a failing test?

PS: I'm also definitely for SLOs, but I figure it's probably better
discussed in a separate thread so I'm trying to stick to the subject of
priority definitions.

On Mon, Feb 11, 2019 at 9:17 AM Scott Wegner  wrote:

> Thanks for driving this discussion. I also was not aware of these existing
> definitions. Once we agree on the terms, let's add them to our Contributor
> Guide and start using them.
>
> +1 in general; I like both Alex and Kenn's definitions; Additional
> wordsmithing could be moved to a Pull Request. Can we make the definitions
> useful for both the person filing a bug, and the assignee, i.e.
>
> : .
> 
>
> On Sun, Feb 10, 2019 at 7:49 PM Kenneth Knowles  wrote:
>
>> The content that Alex posted* is the definition from our Jira
>> installation anyhow.
>>
>> I just searched around, and there's
>> https://community.atlassian.com/t5/Jira-questions/According-to-Jira-What-is-Blocker-Critical-Major-Minor-and/qaq-p/668774
>> which makes clear that this is really user-defined, since Jira has many
>> deployments with their own configs.
>>
>> I guess what I want to know about this thread is what action is being
>> proposed?
>>
>> Previously, there was a thread that resulted in
>> https://beam.apache.org/contribute/precommit-policies/ and
>> https://beam.apache.org/contribute/postcommits-policies/. These have
>> test failures and flakes as Critical. I agree with Alex that these should
>> be Blocker. They disrupt the work of the entire community, so we need to
>> drop everything and get green again.
>>
>> Other than that, I think what you - Daniel - are suggesting is that the
>> definition might be best expressed as SLOs. I asked on
>> u...@infra.apache.org about how we could have those and the answer is
>> the homebrew
>> https://svn.apache.org/repos/infra/infrastructure/trunk/projects/status/sla/jira/.
>> If anyone has time to dig into that and see if it can work for us, that
>> would be cool.
>>
>> Kenn
>>
>> *Blocker: Blocks development and/or testing work, production could not run
>> Critical: Crashes, loss of data, severe memory leak.
>> Major (Default): Major loss of function.
>> Minor: Minor loss of function, or other problem where easy workaround is
>> present.
>> Trivial: Trivial Cosmetic problem like misspelt words or misaligned text.
>>
>>
>> On Sun, Feb 10, 2019 at 7:20 PM Daniel Oliveira 
>> wrote:
>>
>>> Are there existing meanings for the priorities in Jira already? I wasn't
>>> able to find any info on either the Beam website or wiki about it, so I've
>>> just been prioritizing issues based on gut feeling. If not, I think ha

Re: Is it possible to gracefully close GrpcDataService? [was Re: [BEAM-6594] Flakey GrpcDataServiceTest.testMessageReceivedBySingleClientWhenThereAreMultipleClients - failing in precommit]

2019-02-10 Thread Daniel Oliveira
This is something I've run into while working on the reference runner and
it's bugged me too. I've tried looking into what the issue was but usually
hit dead ends. Your post is really helpful, I might use it to take another
look when I have the time.

On Fri, Feb 8, 2019 at 5:26 PM Alex Amato  wrote:

> I think graceful shutdown has been historically overlooked, it would not
> surprise me if there are a few things accidentally left out to gracefully
> shutdown the runner harness/sdk.
>
> IIRC there was also some discussion around starting up incorrectly as well
> (requiring a certain order of SDK process startup and runner harness
> startup, which may have had races as well.)
>
> On Fri, Feb 8, 2019 at 4:49 PM Brian Hulette  wrote:
>
>> I think I've finally got a handle on this flake, and a possible solution
>> [1]. One thing that's still bothering me though is that the "CANCELLED:
>> Multiplexer hanging up" errors seem to be unavoidable.
>>
>> They occur when the GrpcDataService is closed [2] and it closes all of
>> it's multiplexers, which just send an error to their outbound observers
>> [3]. It seems to me that there should be a more graceful way to shut
>> everything down, but I'm not seeing it. Am I missing something?
>>
>> grpc-java suggests using GrpcCleanupRule to gracefully shut-down
>> in-process servers and clients [4], should we be utilizing that somehow?
>>
>> Brian
>>
>> [1] https://github.com/apache/beam/pull/7794
>> [2]
>> https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/data/GrpcDataService.java#L117
>> [3]
>> https://github.com/apache/beam/tree/master/sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataGrpcMultiplexer.java#L112
>> [4]
>> https://github.com/grpc/grpc-java/blob/master/examples/README.md#unit-test-examples
>>
>> On Thu, Feb 7, 2019 at 11:49 AM Brian Hulette 
>> wrote:
>>
>>> This was already reported in BEAM-6512 [1], which Scott gave me as a
>>> starter bug. I haven't been able to reproduce locally, so I'm trying to see
>>> if I can get it to fail on Jenkins again with some additional logging [2].
>>>
>>> Definitely interested in other's thoughts on this, I only vaguely
>>> understand what's going on. So far the only headway I've made is noticing
>>> that the "CANCELLED: Multiplexer hanging up" error seems to always occur
>>> exactly three times in failing tests. Successful runs may have one or two
>>> of these messages but never three.
>>>
>>> [1] https://issues.apache.org/jira/browse/BEAM-6512
>>> [2] https://github.com/apache/beam/pull/7767
>>>
>>> On Tue, Feb 5, 2019 at 9:50 AM Alex Amato  wrote:
>>>

 org.apache.beam.runners.fnexecution.data.GrpcDataServiceTest.testMessageReceivedBySingleClientWhenThereAreMultipleClients

 I keep seeing this test failing in my PRs

 https://builds.apache.org/job/beam_PreCommit_Java_Commit/4018/


 https://builds.apache.org/job/beam_PreCommit_Java_Commit/4018/testReport/junit/org.apache.beam.runners.fnexecution.data/GrpcDataServiceTest/testMessageReceivedBySingleClientWhenThereAreMultipleClients/


 I've seen this one come and go for a few weeks or so. I am unsure
 exactly when it first occured.

>>>


Re: JIRA priorities explaination

2019-02-10 Thread Daniel Oliveira
Are there existing meanings for the priorities in Jira already? I wasn't
able to find any info on either the Beam website or wiki about it, so I've
just been prioritizing issues based on gut feeling. If not, I think having
some well-defined priorities would be nice, at least for our test-failures,
and especially if we wanna have some SLOs like I've seen being thrown about.

On Fri, Feb 8, 2019 at 3:06 PM Kenneth Knowles  wrote:

> I've been thinking about this since working on the release. If I ignore
> the names I think:
>
> P0: get paged, stop whatever you planned on doing, work late to fix
> P1: continually update everyone on status and shouldn't sit around
> unassigned
> P2: most things here; they can be planned or picked up by whomever
> P3: nice-to-have things, maybe starter tasks or lesser cleanup, but no
> driving need
> Sometimes there's P4 but I don't value it. Often P3 is a deprioritized
> thing from P2, so more involved and complex, while P4 is something easy and
> not important filed just as a reminder. Either way, they are both not on
> the main path of work.
>
> I looked into it and the Jira priority scheme determines the set of
> priorities as well as the default. Ours is shared by 635 projects. Probably
> worth keeping. The default priority is Major which would correspond with
> P2. We can expect the default to be where most issues end up.
>
> P0 == Blocker: get paged, stop whatever you planned on doing, work late to
> fix
> P1 == Critical: continually update everyone on status and shouldn't sit
> around unassigned
> P0 == Major (default): most things here; they can be planned or picked up
> by whomever
> P3 == Minor: nice-to-have things, maybe starter tasks or lesser cleanup,
> but no driving need
> Trivial: Maybe this is attractive to newcomers as it makes it sound easy.
>
> Kenn
>
> On Thu, Feb 7, 2019 at 4:08 PM Alex Amato  wrote:
>
>> Hello Beam community, I was thinking about this and found some
>> information to share/discuss. Would it be possible to confirm my thinking
>> on this:
>>
>>- There are 5 priorities in the JIRA system today (tooltip link
>>
>> 
>>):
>>-
>>   - *Blocker* Blocks development and/or testing work, production
>>   could not run
>>   - *Critical* Crashes, loss of data, severe memory leak.
>>   - *Major* Major loss of function.
>>   - *Minor* Minor loss of function, or other problem where easy
>>   workaround is present.
>>   - *Trivial* Cosmetic problem like misspelt words or misaligned
>>   text.
>>- How should JIRA issues be prioritized for pre/post commit test
>>failures?
>>   - I think *Blocker*
>>- What about the flakey failures?
>>   - *Blocker* as well?
>>- How should non test issues be prioritized? (E.g. feature to
>>implement or bugs not regularly breaking tests).
>>   - I suggest *Minor*, but its not clear how to distinguish between
>>   these.
>>
>> Below is my thinking: But I wanted to know what the Apache/Beam community
>> generally thinks about these priorities.
>>
>>- *Blocker*: Expect to be paged. Production systems are down.
>>- *Critical*: Expect to be contacted by email or a bot to fix this.
>>- *Major*: Some loss of function in the repository, can issues that
>>need to be addressed soon are here.
>>- *Minor*: Most issues will be here, important issues within this
>>will get picked up and completed. FRs, bugs.
>>- *Trivial*: Unlikely to be implemented, far too many issues in this
>>category. FRs, bugs.
>>
>> Thanks for helping to clear this up
>> Alex
>>
>


Thoughts on a reference runner to invest in?

2019-02-08 Thread Daniel Oliveira
Hello Beam dev community,

For those who don't know me, I work for Google and I've been working on the
Java reference runner, which is a portable, local Java runner (it's
basically the direct runner with the portability APIs implemented). Our
goal in working on this was to have a portable runner which ran locally so
it could be used by users for testing portable pipelines, devs for testing
new features with portability, and for runner authors to provide a simple
reference implementation of a portable runner.

Due to various circumstances though, progress on the Java reference runner
has been pretty slow, and a Python runner which does pretty much the same
things was made to aid portability development in Python (called the
FnApiRunner). This runner is currently further along in feature work than
the Java reference runner, so we've been reevaluating if we should switch
to investing in it instead.

My question to the community is: Which runner do you think would be more
valuable to the dev community and Beam users? For those of you who are
runner authors, do you have a preference for what language you'd like to
see a reference implementation in?

Thanks,
Daniel Oliveira


Re: Enforce javadoc comments in public methods?

2019-01-07 Thread Daniel Oliveira
+1

I like this idea, especially with the line number requirement. The exact
number of lines is debatable, but you could go as low as 10 lines and that
would exclude any trivial setters and getters. Even better might be if it's
possible to configure checkstyle to ignore this for getters and setters (I
don't know if checkstyle supports this, but I know that other tools are
able to auto-detect getters and setters).

I'm not dead-set against having annotation to suppress the comment, but it
carries the risk that code will be left un-commented because both the dev
and reviewer think it's self-explanatory, and then someone new to the
codebase finds it confusing.

On Mon, Jan 7, 2019 at 11:31 AM Ankur Goenka  wrote:

> I think it makes sense.
> Having an annotation to suppress this check for a method/class instead of
> adding trivial comment would be useful.
>
> On Mon, Jan 7, 2019 at 9:53 AM Ruoyun Huang  wrote:
>
>> Yeah. Agree there is no reason to enforce anything for trivial methods
>> like setter/getter.
>>
>> What I meant is to enforce only for a method that is *BOTH* 1) public
>> method 2) has longer than N lines.
>>
>> sorry for not making the proposal clear enough in the original message,
>> it should've better titled "enforce ... on non-trivial public methods".
>>
>>
>>
>> On Mon, Jan 7, 2019 at 1:31 AM Robert Bradshaw 
>> wrote:
>>
>>> IMHO, requiring comments on trivial methods like setters and getters
>>> is often a net negative, but setting some standard could be useful.
>>>
>>> On Mon, Jan 7, 2019 at 7:35 AM Jean-Baptiste Onofré 
>>> wrote:
>>> >
>>> > Hi,
>>> >
>>> > for the presence of a comment on public method, it's a good idea. Now,
>>> > about the number of lines, not sure it's a good idea. I'm thinking
>>> about
>>> > the getter/setter which are public. Most of the time, the comment is
>>> > pretty simple (and useless ;)).
>>> >
>>> > Regards
>>> > JB
>>> >
>>> > On 07/01/2019 04:35, Ruoyun Huang wrote:
>>> > > Hi, everyone,
>>> > >
>>> > >
>>> > > We were wondering whether it is a good idea to make checkstyle
>>> > > enforce public method comments. Our current behavior of JavaDoc
>>> check is:
>>> > >
>>> > >  1.
>>> > >
>>> > > Missing Class javadoc comment is reported as error.
>>> > >
>>> > >  2.
>>> > >
>>> > > Method comment missing is explicitly allowed. see [1].  It is not
>>> > > even shown as warning.
>>> > >
>>> > >  3.
>>> > >
>>> > > The actual javadoc target gives warning when certain tags are
>>> > > missing in javadoc, but not if the whole comment is missing.
>>> > >
>>> > >
>>> > >How about we enforce method comments for **1) public method and 2)
>>> > > method that is longer than N lines**. (N=~30 seems a good number,
>>> > > leading to ~50 violations in current repository). I can find out the
>>> > > corresponding contributors to fill in the missing comments, before we
>>> > > turning the check fully on.
>>> > >
>>> > >
>>> > >One caveat though is that we might want skip this check on test
>>> code,
>>> > > but I am not sure yet if our current setup can easily handle
>>> separated
>>> > > rules for main code versus test code.
>>> > >
>>> > >
>>> > > Is this a good idea?  Thoughts and suggestions?
>>> > >
>>> > >
>>> > > [1]
>>> > >
>>> https://github.com/apache/beam/blame/5ceffb246c0c38ad68dd208e951a1f39c90ef85c/sdks/java/build-tools/src/main/resources/beam/checkstyle.xml#L111
>>> > >
>>> > >
>>> > > Cheers,
>>> > >
>>> >
>>> > --
>>> > Jean-Baptiste Onofré
>>> > jbono...@apache.org
>>> > http://blog.nanthrax.net
>>> > Talend - http://www.talend.com
>>>
>>
>>
>> --
>> 
>> Ruoyun  Huang
>>
>>


Re: OOO

2018-12-12 Thread Daniel Oliveira
Thanks for all the work you've been doing on Beam, Luke! Hope you have some
good bonding time and that it's not too hectic.

On Wed, Dec 12, 2018 at 10:10 AM Kenneth Knowles  wrote:

> Congrats & have a super time!
>
> Kenn
>
> On Wed, Dec 12, 2018 at 10:09 AM Robert Burke  wrote:
>
>> Have a great bonding time! I'd say "break" but I expect you'll be quite
>> busy.
>>
>> On Wed, Dec 12, 2018, 9:57 AM Etienne Chauchot 
>> wrote:
>>
>>> Enjoy your family time and take care of the little one
>>>
>>> Etienne
>>>
>>> Le mardi 11 décembre 2018 à 12:26 +0100, Maximilian Michels a écrit :
>>>
>>> Thank you for your amazing work on Beam.
>>>
>>>
>>> Enjoy the time with your kid!
>>>
>>>
>>> -Max
>>>
>>>
>>> On 11.12.18 00:55, Pablo Estrada wrote:
>>>
>>> See ya in three months! Take it easy!
>>>
>>>
>>> On Mon, Dec 10, 2018 at 3:27 PM Thomas Weise >>
>>> > wrote:
>>>
>>>
>>> Cute :)
>>>
>>>
>>> Enjoy the time with the family.
>>>
>>>
>>> Thomas
>>>
>>>
>>> On Mon, Dec 10, 2018 at 8:53 AM Ismaël Mejía >>
>>> > wrote:
>>>
>>>
>>> Thanks for the community awareness, enjoy the time with the baby and
>>>
>>> see you soon.
>>>
>>>
>>> On Fri, Dec 7, 2018 at 9:20 PM Lukasz Cwik >>
>>> > wrote:
>>>
>>>  >
>>>
>>>  > I'll be away for the next three months taking care of my little
>>>
>>> one[1] and am excited to see what happens within Apache Beam when I 
>>> return.
>>>
>>>  >
>>>
>>>  > I have been mainly focusing on the portability and SplittableDoFn
>>>
>>> efforts. If there are questions while I'm out, feel free to reach 
>>> out to
>>>
>>> this dev@ list as there are several community members that have been
>>>
>>> involved.
>>>
>>>  >
>>>
>>>  > For portability related stuff:
>>>
>>>  > Thomas Weise
>>>
>>>  > Robert Bradshaw
>>>
>>>  > Maximilian Michels
>>>
>>>  > Ankur Goenka
>>>
>>>  >
>>>
>>>  > For SplittableDoFn stuff:
>>>
>>>  > Robert Bradshaw
>>>
>>>  > Ismael Mejia
>>>
>>>  > JB Onofre
>>>
>>>  >
>>>
>>>  > 1: https://photos.app.goo.gl/sqdcgC5rxDbURPE7A
>>>
>>>
>>>


Re: ULR Tests on commit?

2018-12-12 Thread Daniel Oliveira
Yeah, this is in-progress. The tests should get in for various languages
throughout the next two weeks and I'll add them to the PR template as I add
them.

I do have a JIRA for tracking (BEAM-5449
<https://issues.apache.org/jira/browse/BEAM-5449?filter=-1>) but I haven't
been updating it regularly. I'll try to keep it updated.

On Wed, Dec 12, 2018 at 11:03 AM Scott Wegner  wrote:

> +Daniel Oliveira  who has been working on the ULR.
>
> I believe this is in-progress. Dan, do you have a JIRA for tracking?
>
> On Wed, Dec 12, 2018 at 10:08 AM Robert Burke  wrote:
>
>> In our auto populated github PR template, we have a variety of SDK
>> languages to runner combos, but the Universal Local Runner (ULR) is absent.
>>
>> Do we currently run tests on the ULR as pre-commit or post commit? If
>> not, why not?
>>
>> If so, can we add a ULR column to the PR template?
>>
>> Mostly curious. Thanks!
>> Robert Burke
>> @lostluck, distributed gopher wrangler
>>
>
>
> --
>
>
>
>
> Got feedback? tinyurl.com/swegner-feedback
>


Re: [ANNOUNCE] New committers, October 2018

2018-10-19 Thread Daniel Oliveira
Congratulations!

On Fri, Oct 19, 2018 at 8:27 AM Thomas Weise  wrote:

> Congrats!
>
>
> On Fri, Oct 19, 2018 at 7:24 AM Ismaël Mejía  wrote:
>
>> Congratulations guys and welcome !
>> On Fri, Oct 19, 2018 at 4:12 PM Jean-Baptiste Onofré 
>> wrote:
>> >
>> > Congrats and welcome aboard !
>> >
>> > Regards
>> > JB
>> >
>> > On 19/10/2018 16:09, Kenneth Knowles wrote:
>> > > Hi all,
>> > >
>> > > Hot on the tail of the summer announcement comes our pre-Hallowe'en
>> > > celebration.
>> > >
>> > > Please join me and the rest of the Beam PMC in welcoming the following
>> > > new committers:
>> > >
>> > >  - Xinyu Liu, author/maintainer of the Samza runner
>> > >  - Ankur Goenka, major contributor to portability efforts
>> > >
>> > > And, as before, while I've noted some areas of contribution for each,
>> > > most important is that they are a valued part of our Beam community
>> that
>> > > the PMC trusts with the responsibilities of a Beam committer [1].
>> > >
>> > > A big thanks to both for their contributions.
>> > >
>> > > Kenn
>> > >
>> > > [1]
>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>> >
>> > --
>> > Jean-Baptiste Onofré
>> > jbono...@apache.org
>> > http://blog.nanthrax.net
>> > Talend - http://www.talend.com
>>
>


Re: [DISCUSS] Committer Guidelines / Hygene before merging PRs

2018-09-21 Thread Daniel Oliveira
As a non-committer I think some automated squashing of commits sounds best
since it lightens the load of regular contributors, by not having to always
remember to squash, and lightens the load of committers so it doesn't take
as long to have your PR approved by one.

But for now I think the second best route would be making it PR author's
responsibility to squash fixup commits. Having that expectation described
clearly in the Contributor's Guide, along with some simple step-by-step
instructions for how to do so should be enough. I mainly support this
because I've been doing the squashing myself since I saw a thread about it
here a few months ago. It's not nearly as huge a burden on me as it
probably is for committers who have to merge in many more PRs, it's very
easy to learn how to do, and it's one less barrier to having my code merged
in.

Of course I wouldn't expect that committers wait for PR authors to squash
their fixup commits, but I think leaving a message like "For future pull
requests you should squash any small fixup commits, as described here:
" should be fine.


> I was also thinking about the possibility of wanting to revert
> individual commits from a merge commit. The solution you propose works,
> but only if you want to revert everything.


Does this happen often? I might not have enough context since I'm not a
committer, but it seems to me that often the person performing a revert is
not the original author of a change and doesn't have the context or time to
pick out an individual commit to revert.

On Wed, Sep 19, 2018 at 1:32 PM Maximilian Michels  wrote:

> I tend to agree with you Lukasz. Of course we should try to follow the
> guide lines as much as possible but if it requires an extra back and
> forth with the PR author for a cosmetic change, it may not be worth the
> time.
>
> On 19.09.18 22:17, Lukasz Cwik wrote:
> > I have to say I'm guilty of not following the merge guidelines,
> > sometimes doing merges without rebasing/flatten commits.
> >
> > I find that it is a few extra mins of my time to fix someones PR history
> > if they have more then one logical commit they want to be separate and
> > it usually takes days for the PR author to do merging  with the extra
> > burden as a committer to keep track of another PR and its state (waiting
> > for clean-up) is taxing. I really liked the idea of the mergebot (even
> > though it didn't work out in practice) because it could do all the
> > policy work on my behalf.
> >
> > Anything that reduces my overhead as a committer is useful as for the
> > 100s of PRs that I have merged, I've only had to rollback a couple so
> > I'm for Charle's suggestion which makes the rollback flow slightly more
> > complicated for a significantly easier PR merge workflow.
> >
> > On Wed, Sep 19, 2018 at 1:13 PM Charles Chen  > > wrote:
> >
> > What I mean is that if you get the first-parent commit using "git
> > log --first-parent", it will incorporate any and all fix up commits
> > so we don't need to worry about missing any.
> >
> > On Wed, Sep 19, 2018, 1:07 PM Maximilian Michels  > > wrote:
> >
> > Generally, +1 for isolated commits which are easy to revert.
> >
> >  > I don't think it's actually harder to roll back a set of
> > commits that are merged together.
> > I think Thomas was mainly concerned about "fixup" commits to
> > land in
> > master (as part of a merge). These indeed make reverting commits
> > more
> > difficult because you have to check whether you missed a "fixup".
> >
> >  > Ideally every commit should compile and pass tests though,
> right?
> >
> > That is definitely what we should strive for when doing a merge
> > against
> > master.
> >
> >  > Perhaps the bigger issue is that we need better documentation
> > and a playbook on how to do this these common tasks in git.
> >
> > We do actually have basic documentation about this but most
> > people don't
> > read it. For example, the commit message of a Merge commit
> > should be:
> >
> > Merge pull request #: [BEAM-] Issue title
> >
> > But most merge commits don't comply with this rule :) See
> > https://beam.apache.org/contribute/committer-guide/#merging-it
> >
> > On 19.09.18 21:34, Reuven Lax wrote:
> >  > Ideally every commit should compile and pass tests though,
> right?
> >  >
> >  > On Wed, Sep 19, 2018 at 12:15 PM Ankur Goenka
> > mailto:goe...@google.com>
> >  > >> wrote:
> >  >
> >  > I agree with the cleanliness of the Commit history.
> >  > "Fixup!", "Address comments", "Address even more
> > comments" type of
> >  > comments does not convey meaningful information and are
> >  

Re: [ANNOUNCEMENT] New Beam chair: Kenneth Knowles

2018-09-20 Thread Daniel Oliveira
Congrats Kenn! Sounds like you deserve it!

On Thu, Sep 20, 2018 at 10:20 AM Udi Meiri  wrote:

> Congrats!
>
> On Thu, Sep 20, 2018 at 10:09 AM Raghu Angadi  wrote:
>
>> Congrats Kenn!
>>
>> On Wed, Sep 19, 2018 at 12:54 PM Davor Bonaci  wrote:
>>
>>> Hi everyone --
>>> It is with great pleasure that I announce that at today's meeting of the
>>> Foundation's Board of Directors, the Board has appointed Kenneth Knowles as
>>> the second chair of the Apache Beam project.
>>>
>>> Kenn has served on the PMC since its inception, and is very active and
>>> effective in growing the community. His exemplary posts have been cited in
>>> other projects. I'm super happy to have Kenn accepted the nomination, and
>>> I'm confident that he'll serve with distinction.
>>>
>>> As for myself, I'm not going anywhere. I'm still around and will be as
>>> active as I have recently been. Thrilled to be able to pass the baton to
>>> such a key member of this community and to have less administrative work to
>>> do ;-).
>>>
>>> Please join me in welcoming Kenn to his new role, and I ask that you
>>> support him as much as possible. As always, please let me know if you have
>>> any questions.
>>>
>>> Davor
>>>
>>


Re: jira search in chrome omnibox

2018-08-28 Thread Daniel Oliveira
This seems pretty useful. Thanks Udi!

On Mon, Aug 27, 2018 at 3:54 PM Udi Meiri  wrote:

> In case you want to quickly look up JIRA tickets, e.g., typing 'j', space,
> 'BEAM-4696'.
> Search URL:
> https://issues.apache.org/jira/QuickSearch.jspa?searchString=%s
>
>


Re: Process JobBundleFactory for portable runner

2018-08-15 Thread Daniel Oliveira
I just want to clarify that I understand this correctly since I'm not that
familiar with the details behind all these execution environments yet. Is
the proposal to create a new JobBundleFactory that instead of using Docker
to create the environment that the new processes will execute in, this
JobBundleFactory would execute the new processes directly in the host
environment? So in practice if I ran a pipeline with this JobBundleFactory
the SDK Harness and Runner Harness would both be executing directly on my
machine and would depend on me having the dependencies already present on
my machine?

On Mon, Aug 13, 2018 at 5:50 PM Ankur Goenka  wrote:

> Thanks for starting the discussion. I will be happy to help.
> I agree, we should have pluggable SDKHarness environment Factory.
> We can register multiple Environment factory using service registry and
> use the PipelineOption to pick the right one on per job basis.
>
> There are a couple of things which are require to setup before launching
> the process.
>
>- Setting up the environment as done in boot.go [4]
>- Retrieving and putting the artifacts in the right location.
>
> You can probably leverage boot.go code to setup the environment.
>
> Also, it will be useful to enumerate pros and cons of different
> Environments to help users choose the right one.
>
>
> On Mon, Aug 6, 2018 at 4:50 PM Thomas Weise  wrote:
>
>> Hi,
>>
>> Currently the portable Flink runner only works with SDK Docker containers
>> for execution (DockerJobBundleFactory, besides an in-process (embedded)
>> factory option for testing [1]). I'm considering adding another out of
>> process JobBundleFactory implementation that directly forks the processes
>> on the task manager host, eliminating the need for Docker. This would work
>> reasonably well in environments where the dependencies (in this case
>> Python) can easily be tied into the host deployment (also within an
>> application specific Kubernetes pod).
>>
>> There was already some discussion about alternative JobBundleFactory
>> implementation in [2]. There is also a JIRA to make the bundle factory
>> pluggable [3], pending availability of runner level options.
>>
>> For a "ProcessBundleFactory", in addition to the Python dependencies the
>> environment would also need to have the Go boot executable [4] (or a
>> substitute thereof) to perform the harness initialization.
>>
>> Is anyone else interested in this SDK execution option or has already
>> investigated an alternative implementation?
>>
>> Thanks,
>> Thomas
>>
>> [1]
>> https://github.com/apache/beam/blob/7958a379b0a37a89edc3a6ae4b5bc82fda41fcd6/runners/flink/src/test/java/org/apache/beam/runners/flink/PortableExecutionTest.java#L83
>>
>> [2]
>> https://lists.apache.org/thread.html/d6b6fde764796de31996db9bb5f9de3e7aaf0ab29b99d0adb52ac508@%3Cdev.beam.apache.org%3E
>>
>> [3] https://issues.apache.org/jira/browse/BEAM-4819
>>
>> [4]
>> https://github.com/apache/beam/blob/master/sdks/python/container/boot.go
>>
>>


Re: Removing documentation for old Beam versions

2018-08-02 Thread Daniel Oliveira
The older docs should be recorded in the commit history of the website
repository, right? If they're not currently used in the website and they're
in the commit history then I don't see a reason to save them.

On Tue, Jul 31, 2018 at 1:51 PM Udi Meiri  wrote:

> Hi all,
> I'm writing a PR for apache/beam-site and beam_PreCommit_Website_Stage is
> timing out after 100 minutes, because it's trying to deletes 22k files and
> then copy 22k files (warning large file
> 
> ).
>
> It seems that we could save a lot of time by deleting the older javadoc
> and pydoc files for older versions. Is there a good reason to keep around
> this kind of documentation for older versions (say 1 year back)?
>


Re: [DISCUSS] Automation for Java code formatting

2018-06-27 Thread Daniel Oliveira
+1 I'll throw in my support for auto-formatting, especially if the entire
project is auto-formatted in advance.

On Wed, Jun 27, 2018 at 10:53 AM Huygaa Batsaikhan 
wrote:

> +1. Global auto-formatting is cool!
>
> On Wed, Jun 27, 2018 at 10:17 AM Kenneth Knowles  wrote:
>
>> I just mean that because of how the tool works. But I guess if there were
>> discretion then two different people could end up with autoformatting that
>> disagrees, so again you get lines in the PR diff that aren't real changes.
>>
>> Kenn
>>
>> On Wed, Jun 27, 2018 at 10:16 AM Raghu Angadi  wrote:
>>
>>> On Wed, Jun 27, 2018 at 10:13 AM Kenneth Knowles  wrote:
>>>
 Nope! No discretion allowed :-)

>>>
>>> +1. Fair enough!
>>>
>>>

 On Wed, Jun 27, 2018 at 9:57 AM Raghu Angadi 
 wrote:

> +1.
>
> Wondering if it can be configured to reformat only what we care most
> about (2 space indentation etc), allowing some discretion on the edges. An
> example of inconsistent formatting that ends up in my code:
> ---
> anObject.someLongMethodName(arg_number_1,
>arg_number_2);
> --- vs ---
> anObject.anotherMethodName(
>   arg_number_1,
>   arg_number_2
> );
>
>
> On Wed, Jun 27, 2018 at 9:41 AM Lukasz Cwik  wrote:
>
>> It wasn't clear to me that the intent was to autoformat all the code
>> from the proposal initially. If thats the case, then the delta is quite
>> small typically.
>>
>> Also, it would be easier if we recommended to users to run run
>> "./gradlew spotlessApply" which will run spotless on all modules.
>>
>> On Wed, Jun 27, 2018 at 9:31 AM Kenneth Knowles 
>> wrote:
>>
>>> Luke: the proposal here solves exactly what you are talking about.
>>>
>>> The problem you describe happens when the PR author uses autoformat
>>> but the baseline is not already autoformatted. What I am proposing is to
>>> make sure the baseline is already autoformatted, so PRs never have
>>> extraneous formatting changes.
>>>
>>> Rafael: the default setting on GitHub is "allow edits by
>>> maintainers" so actually a committer can run spotless on behalf of a
>>> contributor and push the fixup. I have done this. It also lets a
>>> committer fix up a good PR and merge it even if the contributor is, say,
>>> asleep.
>>>
>>> Kenn
>>>
>>> On Wed, Jun 27, 2018 at 9:24 AM Rafael Fernandez <
>>> rfern...@google.com> wrote:
>>>
 Luke: Anything that helps contributors and reviewers work better
 together - +1! :D



 On Wed, Jun 27, 2018 at 9:04 AM Lukasz Cwik 
 wrote:

> If spotless is run against a PR that is already well formatted its
> a non-issue as the formatting changes are usually related to the 
> change but
> I have reviewed a few PRs that have 100s of lines of formatting change
> which really obfuscates the work.
> Instead of asking contributors to run spotless, can we have a cron
> job run it across the project like once a day/week/... and cut a PR?
>
> On Wed, Jun 27, 2018 at 8:07 AM Kenneth Knowles 
> wrote:
>
>> Good points, Dan. Checkstyle will still run, but just focused on
>> the things that go beyond format.
>>
>> Kenn
>>
>> On Wed, Jun 27, 2018 at 8:03 AM Etienne Chauchot <
>> echauc...@apache.org> wrote:
>>
>>> +1 !
>>> It's my custom to avoid reformatting to spare meaningless diff
>>> burden to the reviewer. Now it will be over, thanks.
>>>
>>> Etienne
>>>
>>> Le mardi 26 juin 2018 à 21:15 -0700, Kenneth Knowles a écrit :
>>>
>>> Hi all,
>>>
>>> I like readable code, but I don't like formatting it myself. And
>>> I _really_ don't like discussing in code review. "Spotless" [1] can 
>>> enforce
>>> - and automatically apply - automatic formatting for Java, Groovy, 
>>> and some
>>> others.
>>>
>>> This is not about style or wanting a particular layout. This is
>>> about automation, contributor experience, and streamlining review
>>>
>>>  - Contributor experience: MUCH better than checkstyle: error
>>> message just says "run ./gradlew :beam-your-module:spotlessApply" 
>>> instead
>>> of telling them to go in and manually edit.
>>>
>>>  - Automation: You want to use autoformat so you don't have to
>>> format code by hand. But if you autoformat a file that was in some 
>>> other
>>> format, then you touch a bunch of unrelated lines. If the file is 
>>> already
>>> autoformatted, it is much better.
>>>
>>>  - Review: Never talk abo

Re: The full list of proposals / prototype documents

2018-05-23 Thread Daniel Oliveira
+1 to web site page (not Google Doc).

Definitely agree that a common entry point would be excellent. I don't like
the idea of the Google Doc so much because it's not very good for having
changes reviewed and keeping track of who added what, unlike Github. Adding
an entry to the list in the website would require reviews and leave behind
a commit history, which I think is important for an authoritative source
like this.

PS: I also have a doc I proposed that I didn't see in the lists:
https://s.apache.org/beam-runner-api-combine-model

On Wed, May 23, 2018 at 12:52 PM Lukasz Cwik  wrote:

> +1, Thanks for picking this up Alexey
>
> On Wed, May 23, 2018 at 10:41 AM Huygaa Batsaikhan 
> wrote:
>
>> +1. That is great, Alexey. Robin and I are working on documenting some
>> missing pieces of Java SDK. We will let you know when we create polished
>> documents.
>>
>> On Wed, May 23, 2018 at 9:28 AM Ismaël Mejía  wrote:
>>
>>> +1 and thanks for volunteering for this Alexey.
>>> We really need to make this more accesible.
>>> On Wed, May 23, 2018 at 6:00 PM Alexey Romanenko <
>>> aromanenko@gmail.com>
>>> wrote:
>>>
>>> > Joseph, Eugene - thank you very much for the links!
>>>
>>> > All, regarding one common entry point for all design documents. Could
>>> we
>>> just have a dedicated page on Beam web site with a list of links to every
>>> proposed document? Every entry (optionally) might contain, in addition,
>>> short abstract and list of author(s). In this case, it would be easily
>>> searchable and available for those who are interested in this.
>>>
>>> > In the same time, using a Google doc for writing/discussing the
>>> documents
>>> seems more than reasonable since it’s quite native and easy to use. I
>>> only
>>> propose to have a common entry point to fall of them.
>>>
>>> > If this idea looks feasible, I’d propose myself to collect the links to
>>> already created documents, create such page and update this list in the
>>> future.
>>>
>>> > WBR,
>>> > Alexey
>>>
>>> > On 22 May 2018, at 21:34, Eugene Kirpichov 
>>> wrote:
>>>
>>> > Making it easier to manage indeed would be good. Could someone from PMC
>>> please add the following documents of mine to it?
>>>
>>> > SDF related documents:
>>> > http://s.apache.org/splittable-do-fn
>>> > http://s.apache.org/sdf-via-source
>>> > http://s.apache.org/textio-sdf
>>> > http://s.apache.org/beam-watch-transform
>>> > http://s.apache.org/beam-breaking-fusion
>>>
>>> > Non SDF related:
>>> > http://s.apache.org/context-fn
>>> > http://s.apache.org/fileio-write
>>>
>>> > A suggestion: maybe we can establish a convention to send design
>>> document
>>> proposals to dev+desi...@beam.apache.org? Does the Apache mailing list
>>> management software support this kind of stuff? Then they'd be quite easy
>>> to find and filter.
>>>
>>> > On Tue, May 22, 2018 at 10:57 AM Kenneth Knowles 
>>> wrote:
>>>
>>> >> It is owned by the Beam PMC collectively. Any PMC member can add
>>> things
>>> to it. Ideas for making it easy to manage are welcome.
>>>
>>> >> Probably easier to have a markdown file somewhere with a list of docs
>>> so
>>> we can issue and review PRs. Not sure the web site is the right place for
>>> it - we have a history of porting docs to markdown but really that is
>>> high
>>> overhead and users/community probably don't gain from it so much. Some
>>> have
>>> suggested a wiki.
>>>
>>> >> Kenn
>>>
>>> >> On Tue, May 22, 2018 at 10:22 AM Scott Wegner 
>>> wrote:
>>>
>>> >>> Thanks for the links. Any details on that Google drive folder? Who
>>> maintains it? Is it possible for any contributor to add their design doc?
>>>
>>> >>> On Mon, May 21, 2018 at 8:15 AM Joseph PENG <
>>> josephtengp...@gmail.com>
>>> wrote:
>>>
>>>  Alexey,
>>>
>>>  I do not know where you can find all design docs, but I know a blog
>>> that has collected some of the major design docs. Hope it helps.
>>>
>>>  https://wtanaka.com/beam/design-doc
>>>
>>>  https://drive.google.com/drive/folders/0B-IhJZh9Ab52OFBVZHpsNjc4eXc
>>>
>>>  On Mon, May 21, 2018 at 9:28 AM Alexey Romanenko <
>>> aromanenko@gmail.com> wrote:
>>>
>>> > Hi all,
>>>
>>> > Is it possible to obtain somewhere a list of all proposals /
>>> prototype documents that have been published as a technical / design
>>> documents for new features? I have links to only some of them (found in
>>> mail list discussions by chance) but I’m not aware of others.
>>>
>>> > If yes, could someone share it or point me out where it is located
>>> in
>>> case if I missed this?
>>>
>>> > If not, don’t you think it would make sense to have such index of
>>> these documents? I believe it can be useful for Beam contributors since
>>> these proposals contain information which is absent or not so detailed on
>>> Beam web site documentation.
>>>
>>> > WBR,
>>> > Alexey
>>>
>>


Proposed change to Portable Combine Spec - Adding a new URN

2018-05-23 Thread Daniel Oliveira
Hi everyone,

This email should be relevant to anyone interested in the portable pipeline
model. A few months ago I sent out an email with this doc describing my
ideas for modelling portable combines that support lifting:
https://s.apache.org/beam-runner-api-combine-model

Recently, after some offline discussion with other devs working on
portability I'd like to add a new URN to the spec and I figured I should
update the dev list again to get any feedback on the idea. If no one has
any problems with the proposal I'm hoping to add this to the doc in a few
days.

*Proposal:*
The doc currently only has one way to execute unlifted combines: Execute
the ParDo within the CombinePerKey composite provided by the SDK.

The proposal is to add a second way to execute unlifted combines: Adding a
URN to represent an unlifted combine step executed after a GroupByKey
transform, tentatively named "beam:transform:combine_grouped_values". Just
like the other combine parts listed in the doc, the URN would be in a
PTransform along with a CombineFn.

*Reasoning:*
Under the original spec the only way to execute unlifted combines is to
execute a ParDo containing the logic of that combine. In the best case this
is very straightforward: The Runner receives a CombinePerKey and sends it
to the SDK Harness for execution without changing anything and the ParDo
will execute the full combine.

However, this causes issues when sending the provided ParDo for execution
isn't straightforward. Situations may come up, due to runner implementation
details, where a CombineGroupedValues needs to be executed and the ParDo
associated with it is not easily retrieved or doesn't exist. This new URN
provides a backup option so that a full combine can be executed even then.

Thank you,
Daniel Oliveira


Re: Gradle Status [April 6]

2018-04-13 Thread Daniel Oliveira
Ah, I see. I was attempting to replicate Alexey's issues with DirectRunner
tests, so I wasn't checking for that 3s overhead. A quick test for me shows
that I also get that several second overhead when running tests with
Gradle, ranging from 3-6 sec. However, on my machine this delay is
constant, regardless of whether I use the Gradle Runner or Platform Runner.

I think getting more info from someone more knowledgeable about Gradle is a
good idea; I'm still very new to Gradle, which is why my ability to help
with the Gradle effort is mainly limited to helping with documentation.

On Thu, Apr 12, 2018 at 9:41 PM Romain Manni-Bucau 
wrote:

> When you launch a test with gradle runner it launches gradle which makes
> loose 3s on a very fast computer and more on a slower (6 on my personal one
> which is already fast but not as much as my work one). We are 5 to see that
> regression at least. So there is a reason to not use the gradle runner if
> possible cause when you work and need to debug you are just stucked (that
> is why i switched back to mvn after 15mn, i was loosing to much time).
>
> Switching back to native idea test run would fix it but tests just dont
> work this way for me whatever setup i do :( - missing resources IIRC in
> idea out dir.
>
> Le 13 avr. 2018 00:07, "Reuven Lax"  a écrit :
>
> I also don't quite understand what your question is, and it appears like
> Dan spent considerable time trying to reproduce your issue. For the record,
> I have had no issues running tests via Gradle in IntelliJ for the past few
> weeks.
>
> Reuven
>
> On Thu, Apr 12, 2018 at 9:47 PM Daniel Oliveira 
> wrote:
>
>> Sorry Romain, I'm not quite sure what you're asking. Can you clarify?
>>
>> On Thu, Apr 12, 2018 at 12:22 PM Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>>
>>> Well you are the only one to not have the drawbacks to use it so maybe
>>> dont do it? I know Luke is in holidays but anyone else with the knowledge
>>> of why we nees that noise compared to idea native tooling/flow?
>>>
>>> Le 12 avr. 2018 20:16, "Daniel Oliveira"  a
>>> écrit :
>>>
>>>> Ah, I did not. Thanks Romain.
>>>>
>>>> I tried it again, restarting in between, and still had no differences.
>>>> Since it seems like there's no reason not to use "Gradle Test Runner", I'll
>>>> mention it in the contributor's guide.
>>>>
>>>> On Thu, Apr 12, 2018 at 10:31 AM Romain Manni-Bucau <
>>>> rmannibu...@gmail.com> wrote:
>>>>
>>>>> @Daniel: did you restart in between? Otherwise it does nothing. One
>>>>> launches JunitCoreRunner from idea and the other a gradle command.
>>>>>
>>>>> Le 12 avr. 2018 19:24, "Daniel Oliveira"  a
>>>>> écrit :
>>>>>
>>>>>> I think it depends on what exactly switching to "Gradle Test Runner"
>>>>>> from "Platform Test Runner" does. I tried it out on my machine and they
>>>>>> seem to act identically to each other. The IntelliJ documentation says it
>>>>>> determines what API to use to run the tests
>>>>>> <https://www.jetbrains.com/help/idea/runner.html>, so maybe it's
>>>>>> usefulness depends on the user's machine, in which case a note about that
>>>>>> would be useful. Something like: "If your IDE has trouble running tests 
>>>>>> via
>>>>>> IDEA shortcuts, try the following steps: [...]"
>>>>>>
>>>>>> On Thu, Apr 12, 2018 at 3:29 AM Alexey Romanenko <
>>>>>> aromanenko@gmail.com> wrote:
>>>>>>
>>>>>>> Daniel, actually I did run it with default IDEA JUnit test runner.
>>>>>>> Then, in “Settings > Build, Execution, Deployment > Build Tools > 
>>>>>>> Gradle >
>>>>>>> Runner" I selected “Gradle Test Runner” in “Run tests using” selectbox 
>>>>>>> and
>>>>>>> it works ok when I run my tests with IDEA shortcuts. So, probably, we
>>>>>>> should add this details on
>>>>>>> https://beam.apache.org/contribute/intellij/ too.
>>>>>>> What do you think?
>>>>>>>
>>>>>>> WBR,
>>>>>>> Alexey
>>>>>>>
>>>>>>> On 11 Apr 2018, at 21:17, Daniel Oliveira 
>>>>>>> w

Re: Gradle Status [April 6]

2018-04-12 Thread Daniel Oliveira
Sorry Romain, I'm not quite sure what you're asking. Can you clarify?

On Thu, Apr 12, 2018 at 12:22 PM Romain Manni-Bucau 
wrote:

> Well you are the only one to not have the drawbacks to use it so maybe
> dont do it? I know Luke is in holidays but anyone else with the knowledge
> of why we nees that noise compared to idea native tooling/flow?
>
> Le 12 avr. 2018 20:16, "Daniel Oliveira"  a
> écrit :
>
>> Ah, I did not. Thanks Romain.
>>
>> I tried it again, restarting in between, and still had no differences.
>> Since it seems like there's no reason not to use "Gradle Test Runner", I'll
>> mention it in the contributor's guide.
>>
>> On Thu, Apr 12, 2018 at 10:31 AM Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>>
>>> @Daniel: did you restart in between? Otherwise it does nothing. One
>>> launches JunitCoreRunner from idea and the other a gradle command.
>>>
>>> Le 12 avr. 2018 19:24, "Daniel Oliveira"  a
>>> écrit :
>>>
>>>> I think it depends on what exactly switching to "Gradle Test Runner"
>>>> from "Platform Test Runner" does. I tried it out on my machine and they
>>>> seem to act identically to each other. The IntelliJ documentation says it
>>>> determines what API to use to run the tests
>>>> <https://www.jetbrains.com/help/idea/runner.html>, so maybe it's
>>>> usefulness depends on the user's machine, in which case a note about that
>>>> would be useful. Something like: "If your IDE has trouble running tests via
>>>> IDEA shortcuts, try the following steps: [...]"
>>>>
>>>> On Thu, Apr 12, 2018 at 3:29 AM Alexey Romanenko <
>>>> aromanenko@gmail.com> wrote:
>>>>
>>>>> Daniel, actually I did run it with default IDEA JUnit test runner.
>>>>> Then, in “Settings > Build, Execution, Deployment > Build Tools > Gradle >
>>>>> Runner" I selected “Gradle Test Runner” in “Run tests using” selectbox and
>>>>> it works ok when I run my tests with IDEA shortcuts. So, probably, we
>>>>> should add this details on
>>>>> https://beam.apache.org/contribute/intellij/ too.
>>>>> What do you think?
>>>>>
>>>>> WBR,
>>>>> Alexey
>>>>>
>>>>> On 11 Apr 2018, at 21:17, Daniel Oliveira 
>>>>> wrote:
>>>>>
>>>>> Alexey, are you referring to tests run with "./gradlew
>>>>> :beam-runners-direct-java:needsRunnerTests"? That command works fine for 
>>>>> me
>>>>> in both versions of IDEA, but I believe the same tests fail if you run 
>>>>> them
>>>>> directly through "./gradlew test".
>>>>>
>>>>> However, I am having issues with a bunch of validatesRunner tests,
>>>>> mostly be caused by
>>>>> :beam-runners-google-cloud-dataflow-java:validatesRunner. Not sure if it's
>>>>> because of a code change or a gradle config, I'll keep looking into it.
>>>>>
>>>>> On Wed, Apr 11, 2018 at 11:01 AM Romain Manni-Bucau <
>>>>> rmannibu...@gmail.com> wrote:
>>>>>
>>>>>> I got tests running rrconfiguring gradle (which was setup for another
>>>>>> project but seems beam didnt like it) but latency is still "high" using
>>>>>> gradle runner for tests (like Etienne said ~3s on an i7 with 16G vs a few
>>>>>> ms with default idea test runner, would be great to solve that).
>>>>>>
>>>>>> I also find the integration quite fishy cause configurations are
>>>>>> customs so idea is kind of forced to propose your modukes 3 times at 
>>>>>> least
>>>>>> when you select the classpath (x_test being generally the working one).
>>>>>>
>>>>>> Also the false positive you get if you forget a cleanX is a bit
>>>>>> annoying, maybe we should force a clean for test or at least when there 
>>>>>> is
>>>>>> a --tests to avoid gradle to not run it cause there is no diff.
>>>>>>
>>>>>> So it works but dev productivity is reduced a lot and it became slow
>>>>>> enough to think if you should do a contribution or not - at least for me.
>>>>>>
>>>>>> Le 1

Re: Gradle Status [April 6]

2018-04-12 Thread Daniel Oliveira
Ah, I did not. Thanks Romain.

I tried it again, restarting in between, and still had no differences.
Since it seems like there's no reason not to use "Gradle Test Runner", I'll
mention it in the contributor's guide.

On Thu, Apr 12, 2018 at 10:31 AM Romain Manni-Bucau 
wrote:

> @Daniel: did you restart in between? Otherwise it does nothing. One
> launches JunitCoreRunner from idea and the other a gradle command.
>
> Le 12 avr. 2018 19:24, "Daniel Oliveira"  a
> écrit :
>
>> I think it depends on what exactly switching to "Gradle Test Runner" from
>> "Platform Test Runner" does. I tried it out on my machine and they seem to
>> act identically to each other. The IntelliJ documentation says it
>> determines what API to use to run the tests
>> <https://www.jetbrains.com/help/idea/runner.html>, so maybe it's
>> usefulness depends on the user's machine, in which case a note about that
>> would be useful. Something like: "If your IDE has trouble running tests via
>> IDEA shortcuts, try the following steps: [...]"
>>
>> On Thu, Apr 12, 2018 at 3:29 AM Alexey Romanenko <
>> aromanenko@gmail.com> wrote:
>>
>>> Daniel, actually I did run it with default IDEA JUnit test runner. Then,
>>> in “Settings > Build, Execution, Deployment > Build Tools > Gradle >
>>> Runner" I selected “Gradle Test Runner” in “Run tests using” selectbox and
>>> it works ok when I run my tests with IDEA shortcuts. So, probably, we
>>> should add this details on https://beam.apache.org/contribute/intellij/
>>>  too.
>>> What do you think?
>>>
>>> WBR,
>>> Alexey
>>>
>>> On 11 Apr 2018, at 21:17, Daniel Oliveira 
>>> wrote:
>>>
>>> Alexey, are you referring to tests run with "./gradlew
>>> :beam-runners-direct-java:needsRunnerTests"? That command works fine for me
>>> in both versions of IDEA, but I believe the same tests fail if you run them
>>> directly through "./gradlew test".
>>>
>>> However, I am having issues with a bunch of validatesRunner tests,
>>> mostly be caused by
>>> :beam-runners-google-cloud-dataflow-java:validatesRunner. Not sure if it's
>>> because of a code change or a gradle config, I'll keep looking into it.
>>>
>>> On Wed, Apr 11, 2018 at 11:01 AM Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
>>>> I got tests running rrconfiguring gradle (which was setup for another
>>>> project but seems beam didnt like it) but latency is still "high" using
>>>> gradle runner for tests (like Etienne said ~3s on an i7 with 16G vs a few
>>>> ms with default idea test runner, would be great to solve that).
>>>>
>>>> I also find the integration quite fishy cause configurations are
>>>> customs so idea is kind of forced to propose your modukes 3 times at least
>>>> when you select the classpath (x_test being generally the working one).
>>>>
>>>> Also the false positive you get if you forget a cleanX is a bit
>>>> annoying, maybe we should force a clean for test or at least when there is
>>>> a --tests to avoid gradle to not run it cause there is no diff.
>>>>
>>>> So it works but dev productivity is reduced a lot and it became slow
>>>> enough to think if you should do a contribution or not - at least for me.
>>>>
>>>> Le 11 avr. 2018 19:37, "Alexey Romanenko"  a
>>>> écrit :
>>>>
>>>>> I’ve managed to import a project as it’s described in documentation
>>>>> (starting from empty project) using Idea 2018 and run unit tests
>>>>> successfully.
>>>>> For some reasons, tests, that use DirectRunner to run a pipeline, were
>>>>> failed.
>>>>>
>>>>> WBR,
>>>>> Alexey
>>>>>
>>>>> On 11 Apr 2018, at 19:01, Daniel Oliveira 
>>>>> wrote:
>>>>>
>>>>> Hi everyone, I was the one who initially wrote the PR with Idea
>>>>> instructions <https://github.com/apache/beam-site/pull/414>. I was
>>>>> using 2017.3 as well while writing it so all the instructions were tested
>>>>> on that version. I'll try testing the instructions on 2018 to see if I can
>>>>> reproduce the issues people are having.
>>>>>
>>>>> On Wed, Apr 11, 2018 at 9:51 AM 

Re: Gradle Status [April 6]

2018-04-12 Thread Daniel Oliveira
I think it depends on what exactly switching to "Gradle Test Runner" from
"Platform Test Runner" does. I tried it out on my machine and they seem to
act identically to each other. The IntelliJ documentation says it
determines what API to use to run the tests
<https://www.jetbrains.com/help/idea/runner.html>, so maybe it's usefulness
depends on the user's machine, in which case a note about that would be
useful. Something like: "If your IDE has trouble running tests via IDEA
shortcuts, try the following steps: [...]"

On Thu, Apr 12, 2018 at 3:29 AM Alexey Romanenko 
wrote:

> Daniel, actually I did run it with default IDEA JUnit test runner. Then,
> in “Settings > Build, Execution, Deployment > Build Tools > Gradle >
> Runner" I selected “Gradle Test Runner” in “Run tests using” selectbox and
> it works ok when I run my tests with IDEA shortcuts. So, probably, we
> should add this details on https://beam.apache.org/contribute/intellij/
>  too.
> What do you think?
>
> WBR,
> Alexey
>
> On 11 Apr 2018, at 21:17, Daniel Oliveira  wrote:
>
> Alexey, are you referring to tests run with "./gradlew
> :beam-runners-direct-java:needsRunnerTests"? That command works fine for me
> in both versions of IDEA, but I believe the same tests fail if you run them
> directly through "./gradlew test".
>
> However, I am having issues with a bunch of validatesRunner tests, mostly
> be caused by :beam-runners-google-cloud-dataflow-java:validatesRunner. Not
> sure if it's because of a code change or a gradle config, I'll keep looking
> into it.
>
> On Wed, Apr 11, 2018 at 11:01 AM Romain Manni-Bucau 
> wrote:
>
>> I got tests running rrconfiguring gradle (which was setup for another
>> project but seems beam didnt like it) but latency is still "high" using
>> gradle runner for tests (like Etienne said ~3s on an i7 with 16G vs a few
>> ms with default idea test runner, would be great to solve that).
>>
>> I also find the integration quite fishy cause configurations are customs
>> so idea is kind of forced to propose your modukes 3 times at least when you
>> select the classpath (x_test being generally the working one).
>>
>> Also the false positive you get if you forget a cleanX is a bit annoying,
>> maybe we should force a clean for test or at least when there is a --tests
>> to avoid gradle to not run it cause there is no diff.
>>
>> So it works but dev productivity is reduced a lot and it became slow
>> enough to think if you should do a contribution or not - at least for me.
>>
>> Le 11 avr. 2018 19:37, "Alexey Romanenko"  a
>> écrit :
>>
>>> I’ve managed to import a project as it’s described in documentation
>>> (starting from empty project) using Idea 2018 and run unit tests
>>> successfully.
>>> For some reasons, tests, that use DirectRunner to run a pipeline, were
>>> failed.
>>>
>>> WBR,
>>> Alexey
>>>
>>> On 11 Apr 2018, at 19:01, Daniel Oliveira 
>>> wrote:
>>>
>>> Hi everyone, I was the one who initially wrote the PR with Idea
>>> instructions <https://github.com/apache/beam-site/pull/414>. I was
>>> using 2017.3 as well while writing it so all the instructions were tested
>>> on that version. I'll try testing the instructions on 2018 to see if I can
>>> reproduce the issues people are having.
>>>
>>> On Wed, Apr 11, 2018 at 9:51 AM Lukasz Cwik  wrote:
>>>
>>>> I use 2017.3 and it has been reliable for me. I haven't tried 2018 yet.
>>>>
>>>> On Wed, Apr 11, 2018 at 11:30 AM Romain Manni-Bucau <
>>>> rmannibu...@gmail.com> wrote:
>>>>
>>>>> Any of you using the idea 2018? the import works for me but then it is
>>>>> not as smooth as it seems for you. I'm just trying to see if it is a
>>>>> procedure thing or a version issue.
>>>>>
>>>>> Romain Manni-Bucau
>>>>> @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
>>>>>
>>>>>
>>>>> 2018-04-11 17:28 GMT+02:00 Kenneth Knowles :
>>>>> > The only reason I did "empty project then add a module" procedure
>>>>> was to get
>>>>> > all the IntelliJ files outside the source tree. IIRC directly
>>>>> creating from
>>>>> > existing sources didn't give the necessary configuration options. If
>>>>> you
>>>>> > don't care about being

Re: Gradle Status [April 6]

2018-04-11 Thread Daniel Oliveira
Alexey, are you referring to tests run with "./gradlew
:beam-runners-direct-java:needsRunnerTests"? That command works fine for me
in both versions of IDEA, but I believe the same tests fail if you run them
directly through "./gradlew test".

However, I am having issues with a bunch of validatesRunner tests, mostly
be caused by :beam-runners-google-cloud-dataflow-java:validatesRunner. Not
sure if it's because of a code change or a gradle config, I'll keep looking
into it.

On Wed, Apr 11, 2018 at 11:01 AM Romain Manni-Bucau 
wrote:

> I got tests running rrconfiguring gradle (which was setup for another
> project but seems beam didnt like it) but latency is still "high" using
> gradle runner for tests (like Etienne said ~3s on an i7 with 16G vs a few
> ms with default idea test runner, would be great to solve that).
>
> I also find the integration quite fishy cause configurations are customs
> so idea is kind of forced to propose your modukes 3 times at least when you
> select the classpath (x_test being generally the working one).
>
> Also the false positive you get if you forget a cleanX is a bit annoying,
> maybe we should force a clean for test or at least when there is a --tests
> to avoid gradle to not run it cause there is no diff.
>
> So it works but dev productivity is reduced a lot and it became slow
> enough to think if you should do a contribution or not - at least for me.
>
> Le 11 avr. 2018 19:37, "Alexey Romanenko"  a
> écrit :
>
>> I’ve managed to import a project as it’s described in documentation
>> (starting from empty project) using Idea 2018 and run unit tests
>> successfully.
>> For some reasons, tests, that use DirectRunner to run a pipeline, were
>> failed.
>>
>> WBR,
>> Alexey
>>
>> On 11 Apr 2018, at 19:01, Daniel Oliveira  wrote:
>>
>> Hi everyone, I was the one who initially wrote the PR with Idea
>> instructions <https://github.com/apache/beam-site/pull/414>. I was using
>> 2017.3 as well while writing it so all the instructions were tested on that
>> version. I'll try testing the instructions on 2018 to see if I can
>> reproduce the issues people are having.
>>
>> On Wed, Apr 11, 2018 at 9:51 AM Lukasz Cwik  wrote:
>>
>>> I use 2017.3 and it has been reliable for me. I haven't tried 2018 yet.
>>>
>>> On Wed, Apr 11, 2018 at 11:30 AM Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
>>>> Any of you using the idea 2018? the import works for me but then it is
>>>> not as smooth as it seems for you. I'm just trying to see if it is a
>>>> procedure thing or a version issue.
>>>>
>>>> Romain Manni-Bucau
>>>> @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
>>>>
>>>>
>>>> 2018-04-11 17:28 GMT+02:00 Kenneth Knowles :
>>>> > The only reason I did "empty project then add a module" procedure was
>>>> to get
>>>> > all the IntelliJ files outside the source tree. IIRC directly
>>>> creating from
>>>> > existing sources didn't give the necessary configuration options. If
>>>> you
>>>> > don't care about being able to `git clean` then you can do the shorter
>>>> > version. Also the particular UI for creation might have improved
>>>> since I
>>>> > tried it. I'll do it again.
>>>> >
>>>> > On the pom, since it is only generated for machine reading, why care
>>>> if the
>>>> > parent is inlined? I actually prefer to avoid coupling with things
>>>> that you
>>>> > have to go and look up. Using inheritance is OK for human edited poms
>>>> > (actually IMO it is still a mistake) but it doesn't change the
>>>> semantics of
>>>> > a shipped pom if they are all immutable, which they should be. It is
>>>> better
>>>> > to have all the info right there. Is there an actually effective
>>>> difference?
>>>> >
>>>> > Kenn
>>>> >
>>>> > On Wed, Apr 11, 2018 at 6:53 AM Etienne Chauchot <
>>>> echauc...@apache.org>
>>>> > wrote:
>>>> >>
>>>> >> Hi all,
>>>> >> I just tested gradle environment from a fresh source clone with this
>>>> >> procedure with just a tiny change: I used "new project from existing
>>>> >> sources" rather than create empty project and then add 

Re: Gradle Status [April 6]

2018-04-11 Thread Daniel Oliveira
Hi everyone, I was the one who initially wrote the PR with Idea instructions
. I was using 2017.3 as well
while writing it so all the instructions were tested on that version. I'll
try testing the instructions on 2018 to see if I can reproduce the issues
people are having.

On Wed, Apr 11, 2018 at 9:51 AM Lukasz Cwik  wrote:

> I use 2017.3 and it has been reliable for me. I haven't tried 2018 yet.
>
> On Wed, Apr 11, 2018 at 11:30 AM Romain Manni-Bucau 
> wrote:
>
>> Any of you using the idea 2018? the import works for me but then it is
>> not as smooth as it seems for you. I'm just trying to see if it is a
>> procedure thing or a version issue.
>>
>> Romain Manni-Bucau
>> @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
>>
>>
>> 2018-04-11 17:28 GMT+02:00 Kenneth Knowles :
>> > The only reason I did "empty project then add a module" procedure was
>> to get
>> > all the IntelliJ files outside the source tree. IIRC directly creating
>> from
>> > existing sources didn't give the necessary configuration options. If you
>> > don't care about being able to `git clean` then you can do the shorter
>> > version. Also the particular UI for creation might have improved since I
>> > tried it. I'll do it again.
>> >
>> > On the pom, since it is only generated for machine reading, why care if
>> the
>> > parent is inlined? I actually prefer to avoid coupling with things that
>> you
>> > have to go and look up. Using inheritance is OK for human edited poms
>> > (actually IMO it is still a mistake) but it doesn't change the
>> semantics of
>> > a shipped pom if they are all immutable, which they should be. It is
>> better
>> > to have all the info right there. Is there an actually effective
>> difference?
>> >
>> > Kenn
>> >
>> > On Wed, Apr 11, 2018 at 6:53 AM Etienne Chauchot 
>> > wrote:
>> >>
>> >> Hi all,
>> >> I just tested gradle environment from a fresh source clone with this
>> >> procedure with just a tiny change: I used "new project from existing
>> >> sources" rather than create empty project and then add module.
>> >>
>> >> It works fine and junit runs from intellij also work.  with gradle we
>> pay
>> >> a 2s delay (on my machine) before running the actual test for each
>> run. This
>> >> delay seems quite constant no matter the module: I have run all the
>> tests at
>> >> once in  runner-core and a single test in another module with a similar
>> >> delay.
>> >>
>> >> I also tested a debug session from intellij and it runs fine also.
>> >>
>> >> I'll do some more testing and keep you posted.
>> >>
>> >> I still have some concerns about the potential difficulty for new
>> >> contributors to have to learn gradle in addition to Beam itself
>> comparing to
>> >> maven which is more mainstream for java developers. That could be
>> >> discouraging for ex for part-time contributors
>> >>
>> >> Etienne
>> >>
>> >> Le mardi 10 avril 2018 à 14:38 +, Lukasz Cwik a écrit :
>> >>
>> >> beam-site PR/414 updates the instructions for using Intellij and how to
>> >> import a module:
>> >>
>> >> 1. Create an empty IntelliJ project outside of the Beam source tree.
>> >> 2. Under Project Structure > Project, select a Project SDK.
>> >> 3. Under Project Structure > Modules, click the + sign to add a module
>> and
>> >>select "Import Module".
>> >> 1. Select the directory containing the Beam source tree.
>> >> 2. Tick the "Import module from external model" button and select
>> >> Gradle
>> >>from the list.
>> >> 3. Tick the following boxes.
>> >>* Use auto-import
>> >>* Create separate module per source set
>> >>* Store generated project files externally
>> >>* Use default gradle wrapper
>> >> 4. Delegate build actions to Gradle by going to Settings > Build,
>> >> Execution,
>> >>Deployment > Build Tools > Gradle and checking "Delegate IDE
>> build/run
>> >>actions to gradle".
>> >>
>> >> On Tue, Apr 10, 2018 at 10:34 AM Jean-Baptiste Onofré > >
>> >> wrote:
>> >>
>> >> That's a very important issue for contribution. Up to now, I used Maven
>> >> for setup IntelliJ (and it works just fine). If we remove the pom.xml,
>> >> we have to support Eclipse and IntelliJ "smoothly".
>> >>
>> >> Let me try in IntelliJ.
>> >>
>> >> Regards
>> >> JB
>> >>
>> >> On 10/04/2018 15:21, Romain Manni-Bucau wrote:
>> >> > You dont have issue due to the build setup with that option. I get:
>> >> >
>> >> > avr. 10, 2018 3:20:10 PM
>> >> > org.apache.beam.runners.direct.DirectTransformExecutor run
>> >> > GRAVE: Error occurred within
>> >> > org.apache.beam.runners.direct.DirectTransformExecutor@66761b7a
>> >> > com.google.common.util.concurrent.ExecutionError:
>> >> > java.lang.NoClassDefFoundError: net/bytebuddy/NamingStrategy
>> >> >
>> >> > ?
>> >> >
>> >> > Romain Manni-Bucau
>> >> > @rmannibucau |  Blog | Old Blog | Github | LinkedIn | Book
>> >> >
>> >> >
>> >> > 2018-04-10 15:13 GMT+02:00 Lukasz Cwik :
>> >> >> I have f

Gradle questions on Eclipse and End to End tests

2018-04-05 Thread Daniel Oliveira
So I'm working on updating the Beam Contributor's Guide to swap references
to Maven with Gradle. I'm wondering if anyone can help with two trouble
spots I'm having:

1. Has anyone set up Eclipse to work with Gradle for beam development? If
so can you give me a description of how that's done for this page? Beam
Eclipse Tips <https://beam.apache.org/contribute/eclipse/>

2. I'm having trouble finding relevant information for this section: E2E
Testing Framework
<https://beam.apache.org/contribute/testing/#e2e-testing-framework>.
Does anyone know what the progress is on E2E tests in Gradle?

Thanks
Daniel Oliveira


Re: [ANNOUCEMENT] New Foundation members!

2018-04-03 Thread Daniel Oliveira
Congrats!

On Tue, Apr 3, 2018 at 2:05 AM Etienne Chauchot 
wrote:

> Congrats
> Le mardi 03 avril 2018 à 10:41 +0200, Kostas Kloudas a écrit :
>
> Congratulations to everyone!
>
> On Apr 2, 2018, at 9:14 PM, Kenneth Knowles  wrote:
>
> Congratulations!
>
> On Mon, Apr 2, 2018 at 11:44 AM Alan Myrvold  wrote:
>
> Congratulations!
>
> On Mon, Apr 2, 2018 at 9:14 AM Scott Wegner  wrote:
>
> Congrats!
>
> On Sat, Mar 31, 2018 at 12:18 PM Robert Burke  wrote:
>
> Congratulations!
>
> On Sat, 31 Mar 2018 at 11:53 Ekrem Aksoy  wrote:
>
> Congrats!
>
> On Sat, Mar 31, 2018 at 2:08 AM, Davor Bonaci  wrote:
>
> Now that this is public... please join me in welcoming three newly elected
> members of the Apache Software Foundation with ties to this community, who
> were elected during the most recent Members' Meeting.
>
> * Ismaël Mejía (Beam PMC)
>
> * Josh Wills (Crunch Chair; Beam, DataFu PMC)
>
> * Holden Karau (Spark, SystemML PMC; Mahout, Subversion committer; Beam
> contributor)
>
> These individuals demonstrated merit in Foundation's growth, evolution,
> and progress. They were recognized, nominated, and elected by existing
> membership for their significant impact to the Foundation as a whole, such
> as the roots of project-related and cross-project activities.
>
> As members, they now become legal owners and shareholders of the
> Foundation. They can vote for the Board, incubate new projects, nominate
> new members, participate in any PMC-private discussions, and contribute to
> any project.
>
> (For the Beam community, this election nearly doubles the number of
> Foundation members. The new members are joining Jean-Baptiste Onofré,
> Stephan Ewen, Romain Manni-Bucau and myself in this role.)
>
> I'm happy to be able to call all three of you my fellow members.
> Congratulations!
>
> Davor
>
>
>
>
> --
>
>
> Got feedback? http://go/swegner-feedback
>
>
>
>


Re: Design specs for portable Combine

2018-03-16 Thread Daniel Oliveira
So since I made some updates to the doc I feel like this is a good time to
add a summary (I didn't know I needed to do that when I originally sent it
out).

Structure and Lifting of Combines (In Apache Beam Portability)
This doc covers how Combines will be modeled in the Runner API and Fn API,
as well as how the model should be used to perform Combines in different
ways and how this model can be expanded on in the future. Some of the
important points:

   - Combines are modeled by having transforms with CombinePayloads and one
   of several URNs.
   - In the pipeline the Combine is a composite transform with its
   subtransforms describing the implementation.
   - URNs are provided for the Combine Per Key composite transform, and the
   steps Pre-Combine, Merge Accumulators, Extract Output.
   - Non-lifted Combines are implemented as a GroupByKey -> ParDo.
   - Lifted Combines are implemented as Pre-Combine -> GroupByKey -> Merge
   Accumulators -> Extract Output.
   - Side inputs are not described in the model as they can rarely be
   lifted. Combines with side inputs are modeled as GroupByKey -> ParDo.



On Fri, Mar 9, 2018 at 10:19 AM Daniel Oliveira 
wrote:

> Hi everyone, I'm going to be working on getting Combines working with
> portable pipelines, and I've written up a design for how to model them. If
> anyone's interested in portability please check it out and provide any
> feedback you may have.
>
> *https://s.apache.org/beam-runner-api-combine-model
> <https://s.apache.org/beam-runner-api-combine-model>*
>
> *One part I'm curious for community feedback on is the idea of disabling
> Combiner lifting for Combines with side inputs. I mention it here
> <https://docs.google.com/document/d/1-3mEs3Y7bIkJ0hmQ6SiHpVIFu5vbY6Zcpw-7tOMVg4U/edit#bookmark=id.ur8f96unbqx8>.
> Please let me know if you have objections to that idea.*
>
> *Thank you,*
> *Daniel Oliveira*
>


Design specs for portable Combine

2018-03-09 Thread Daniel Oliveira
Hi everyone, I'm going to be working on getting Combines working with
portable pipelines, and I've written up a design for how to model them. If
anyone's interested in portability please check it out and provide any
feedback you may have.

*https://s.apache.org/beam-runner-api-combine-model
<https://s.apache.org/beam-runner-api-combine-model>*

*One part I'm curious for community feedback on is the idea of disabling
Combiner lifting for Combines with side inputs. I mention it here
<https://docs.google.com/document/d/1-3mEs3Y7bIkJ0hmQ6SiHpVIFu5vbY6Zcpw-7tOMVg4U/edit#bookmark=id.ur8f96unbqx8>.
Please let me know if you have objections to that idea.*

*Thank you,*
*Daniel Oliveira*


Re: Dataflow runner examples build fail

2018-01-08 Thread Daniel Oliveira
+1

On Mon, Jan 8, 2018 at 10:07 AM, Kenneth Knowles  wrote:

> +1
>
> On Mon, Jan 8, 2018 at 9:33 AM, Henning Rohde  wrote:
>
>> +1
>>
>> On Mon, Jan 8, 2018 at 1:32 AM, Ted Yu  wrote:
>>
>>> +1
>>>
>>>  Original message 
>>> From: Jean-Baptiste Onofré 
>>> Date: 1/8/18 1:26 AM (GMT-08:00)
>>> To: dev@beam.apache.org
>>> Subject: Dataflow runner examples build fail
>>>
>>> Hi guys,
>>>
>>> The PRs and nightly builds are failing due to an issue with the dataflow
>>> platform: it seems we have a disk quota exceeded on the us-central1
>>> region.
>>>
>>> I would like to do a clean out and increase the quota a bit.
>>>
>>> Thoughts ?
>>>
>>> Thanks
>>> Regards
>>> JB
>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>
>>
>


Re: Possibility of requiring Java 8 compiler for building Java 7 sources?

2017-09-26 Thread Daniel Oliveira
Yes, just as Ismaël said it's a compilation blocker right now despite that
(I believe) we don't use the extension that's breaking.

As for other ways to solve this, if there is a way to avoid compiling the
advanced features of AutoValue that might be worth a try. We could also try
to get a release of AutoValue with the fix that works in Java 7. However I
feel that slowly moving over to Java 8 is the most future-proof solution if
it's possible.

On Tue, Sep 26, 2017 at 2:47 PM, Ismaël Mejía  wrote:

> The current issue is that compilation fails on master because beam's
> parent pom is configured to fail if it finds warnings):
>
> -Werror
>
> However if you remove that line from the parent pom the compilation passes.
>
> Of course this does not mean that everything is solved for Java 9,
> there are some tests that break and other issues because of other
> plugins and dependencies (e.g. bytebuddy), but those are not part of
> this discussion.
>
> On Tue, Sep 26, 2017 at 11:38 PM, Eugene Kirpichov
>  wrote:
> > AFAIK we don't use any advanced capabilities of AutoValue. Does that mean
> > this issue is moot? I didn't quite understand from your email whether it
> is
> > a compilation blocker for Beam or not.
> >
> > On Tue, Sep 26, 2017 at 2:32 PM Ismaël Mejía  wrote:
> >
> >> Great that you are also working on this too Daniel and thanks for
> >> bringing this subject to the mailing list, I was waiting to  my return
> >> to office next week, but you did it first :)
> >>
> >> Eugene for reference (This is the issue on the migration to Java 9),
> >> notice that here the goal is first that beam passes mvn clean install
> >> with pure Java 9 (and also add this to jenkins), not to rewrite
> >> anything to use the new stuff (e.g. modules):
> >> https://issues.apache.org/jira/browse/BEAM-2530
> >>
> >> Eugene can you also PTAL at the AutoValue issue, more details on the
> >> issue, this is a warning so I don't know if it is really critical in
> >> particular because we are not using Memoization (do we?).
> >> https://github.com/google/auto/issues/503
> >>
> >> https://github.com/google/auto/commit/71514081f2ca6fb4ead2b7f0a25f5d
> 02247b8532
> >>
> >> Wouldn't the easiest way be that you guys convince the google.auto
> >> guys to generate that simple fix in a Java 7 compatible way and
> >> 'voila' ?
> >>
> >> However I agree that moving to Java 8 is an excellent idea and as
> >> Eugene mentions there is less friction now since most projects are
> >> moving, only pending issue are existing clusters on java 7 in the
> >> hadoop world, but those are less frequent now. Anyway this discussion
> >> is really important (maybe even worth a vote). Because moving to Java
> >> 8 will allow us also to move some of the dependencies that we are
> >> keeping in old versions and in general to move forward.
> >>
> >> What do the others think ?
> >>
> >>
> >>
> >> On Tue, Sep 26, 2017 at 11:09 PM, Eugene Kirpichov
> >>  wrote:
> >> > Very excited to hear that there's work on JDK9 support - is there a
> >> public
> >> > description of the plans for this work somewhere?
> >> >
> >> > In general, Beam could probably drop Java7 support altogether at some
> >> point
> >> > soon: Java7 has reached end-of-life (i.e. it's not receiving even
> >> security
> >> > patches) 2 years ago, and all major players in the data processing
> >> > ecosystem have dropped Java7 support (Spark, Flink, Hadoop), so I
> presume
> >> > the demand for Java7 support in the data processing industry is low.
> By
> >> the
> >> > way: would a Java8 migration be in the scope of your work in general?
> >> >
> >> > However, until we say that Beam requires Java8, what would be the
> >> > implications of using a version of AutoValue that can only be compiled
> >> with
> >> > Java8? Are you saying that this is simply a matter of a compiler bug,
> and
> >> > if we use a Java8 compiler but configured to use source and target
> >> versions
> >> > of 1.7 and using bootclasspath of rt.jar from 1.7, then the resulting
> >> Beam
> >> > artifacts will be usable by people who don't have Java8?
> >> >
> >> > On Tue, Sep 26, 2017 at 1:53 PM Daniel Oliveira
> >> >  wrote:
> >> >
> >> >> So I've been working on JDK 9 support for Beam, and I have a bug in
> >> >> AutoValue that can be fixed by updating our AutoValue dependency to
> the
> >> >> latest. The problem is that AutoValue from 1.5+ seems to be banned
> for
> >> Beam
> >> >> due to requiring Java 8 compilers. However, it should still be
> possible
> >> to
> >> >> compile and execute Java 7 code from the Java 8 compiler by building
> >> with
> >> >> the correct arguments. So the fix to this bug would essentially
> require
> >> >> Java 8 compilers even for compiling Java 7 code.
> >> >>
> >> >> Does anyone need to use Java 7 compilers? Because if not I would
> like to
> >> >> continue with this fix.
> >> >>
> >>
>


Possibility of requiring Java 8 compiler for building Java 7 sources?

2017-09-26 Thread Daniel Oliveira
So I've been working on JDK 9 support for Beam, and I have a bug in
AutoValue that can be fixed by updating our AutoValue dependency to the
latest. The problem is that AutoValue from 1.5+ seems to be banned for Beam
due to requiring Java 8 compilers. However, it should still be possible to
compile and execute Java 7 code from the Java 8 compiler by building with
the correct arguments. So the fix to this bug would essentially require
Java 8 compilers even for compiling Java 7 code.

Does anyone need to use Java 7 compilers? Because if not I would like to
continue with this fix.


New contributor

2017-09-13 Thread Daniel Oliveira
Hi everyone,

My name's Daniel Oliveira. I work at Google and I'd like to start
contributing to this project so I wanted to introduce myself.

I've already read through the contribution guide and I'm excited to start
making contributions soon!

Thank you,
Daniel Oliveira