Re: An update on Python postcommit tests

2019-07-19 Thread Ahmet Altay
Thank you! Hopefully this will improve the state of testing.

On Fri, Jul 19, 2019 at 2:53 PM Valentyn Tymofieiev 
wrote:

> We have split Python 2, Python 3.5 - 3.7 postcommit test suites into
> individual Jenkins jobs that can be triggered with a phrase of their own.
>
> "Run Python PostCommit" no longer triggers anything.
>
> Use following trigger phrases to start or re-run postcommits on your PR:
>
> Run Python 2 PostCommit
> Run Python 3.5 PostCommit
> Run Python 3.6 PostCommit
> Run Python 3.7 PostCommit
>
> Trigger phrases are case-insensitive. Also, at the bottom of the PR
> template there is a glossary of all jobs and trigger phrases.
>
> Thanks,
> Valentyn
>


An update on Python postcommit tests

2019-07-19 Thread Valentyn Tymofieiev
We have split Python 2, Python 3.5 - 3.7 postcommit test suites into
individual Jenkins jobs that can be triggered with a phrase of their own.

"Run Python PostCommit" no longer triggers anything.

Use following trigger phrases to start or re-run postcommits on your PR:

Run Python 2 PostCommit
Run Python 3.5 PostCommit
Run Python 3.6 PostCommit
Run Python 3.7 PostCommit

Trigger phrases are case-insensitive. Also, at the bottom of the PR
template there is a glossary of all jobs and trigger phrases.

Thanks,
Valentyn


Re: How to run DynamoDBIOTest?

2019-07-19 Thread Anton Kedin
One of the machines is macOS 10.14.5, docker desktop 2.0.0.3 (engine
18.09.2), java 1.8.0_211 (I believe Oracle version). Test log:
https://gist.github.com/akedin/da6fbc8a993f758302a6f64c42bba11b#file-gistfile1-txt
It
then spins forever with only gradle logs. Another machine I tried it on is
something debian-based, open jdk 8 212, docker 18.09.3, logs are similar
(don't have access to the details at the moment).

Regards,
Anton

On Fri, Jul 19, 2019 at 2:01 PM Ismaël Mejía  wrote:

> This looks weird, I run the build in my machine (Ubuntu linux 18.04 +
> OpenJDK 1.8.0_201) + docker 18.09.8 in both master and the release
> 2.14.0 branch and it executes without issue.
> This test uses a docker image as a sort of embedded server to simulate
> the Amazon backend (localstack).
> All builds were green when merged. Do you get any extra logs Anton?
> what is your OS / Java version?
> Adding Cam to the discussion since he contributed this feature to see
> if he may have any extra context.
>
> On Fri, Jul 19, 2019 at 7:15 PM Anton Kedin  wrote:
> >
> > Hi dev@,
> >
> > Does anyone know if there's anything extra needed to run
> `DynamoDBIOTest`? If I do `./graldew
> :sdks:java:io:amazon-web-services:build --debug` it passes few tests during
> `:test` but then seems to sit on `DynamoDBIOTest` forever. No errors, last
> meaningful log is `INFO: Container localstack/localstack:0.8.6 started`.
> Happens on different machines, both on master and release-2.14.0 branches.
> >
> > Any pointers?
> >
> > Regards,
> > Anton
>


Re: How to run DynamoDBIOTest?

2019-07-19 Thread Ismaël Mejía
This looks weird, I run the build in my machine (Ubuntu linux 18.04 +
OpenJDK 1.8.0_201) + docker 18.09.8 in both master and the release
2.14.0 branch and it executes without issue.
This test uses a docker image as a sort of embedded server to simulate
the Amazon backend (localstack).
All builds were green when merged. Do you get any extra logs Anton?
what is your OS / Java version?
Adding Cam to the discussion since he contributed this feature to see
if he may have any extra context.

On Fri, Jul 19, 2019 at 7:15 PM Anton Kedin  wrote:
>
> Hi dev@,
>
> Does anyone know if there's anything extra needed to run `DynamoDBIOTest`? If 
> I do `./graldew :sdks:java:io:amazon-web-services:build --debug` it passes 
> few tests during `:test` but then seems to sit on `DynamoDBIOTest` forever. 
> No errors, last meaningful log is `INFO: Container 
> localstack/localstack:0.8.6 started`. Happens on different machines, both on 
> master and release-2.14.0 branches.
>
> Any pointers?
>
> Regards,
> Anton


Jenkins failures / dependency downloads / gradle caching

2019-07-19 Thread Kenneth Knowles
Hi all,

Are we breaking the local maven central cache? I've been unable to get a
green Java PreCommit and it appears to be entirely having trouble
downloading dependencies which I would not expect to require a download.

Separately, something strange seems to be going on with maven central or
our relationship to it that this causes failures rather than just slow
builds.

Kenn


Re: precommits failing on git clean:

2019-07-19 Thread Udi Meiri
Is this a regression? Is it due to an in-progress PR? I can't figure out
where the module go.opencensus.io@v0.22.0 is included.

On Fri, Jul 19, 2019 at 11:59 AM Robert Burke  wrote:

> First time contributor Zach might have a solution in this PR, but it seems
> like it would need care since it's pretty broad.
>
> https://github.com/apache/beam/pull/9096
>
> On Fri, Jul 19, 2019, 11:53 AM Udi Meiri  wrote:
>
>> https://issues.apache.org/jira/browse/BEAM-7788
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [2.14.0] Release Progress Update

2019-07-19 Thread Anton Kedin
Verification build succeeds except for AWS IO (which has tests hanging). I
will continue the release process as normal and will investigate the AWS IO
issue meanwhile. Will either disable the hanging tests to get the artifacts
for an RC or will continue without it temporarily, will need to re-validate
it when the issue is resolved.

Regards,
Anton

On Thu, Jul 18, 2019 at 8:54 AM Anton Kedin  wrote:

> All cherry-picks are merged, blocker jiras closed, running the
> verification build.
>
> On Mon, Jul 15, 2019 at 4:53 PM Ahmet Altay  wrote:
>
>> Anton, any updates on this release? Do you need help?
>>
>> On Fri, Jun 28, 2019 at 11:42 AM Anton Kedin  wrote:
>>
>>> I have been running validation builds (had some hickups with that),
>>> everything looks mostly good, except failures in `:beam-test-tools` and
>>> `:io:aws`. Now I will start cherry-picking other fixes and trying to figure
>>> the specific issues out.
>>>
>>> Regards,
>>> Anton
>>>
>>> On Fri, Jun 21, 2019 at 3:17 PM Anton Kedin  wrote:
>>>
 Not much progress today. Debugging build issues when running global
 `./gradlew build -PisRelease --scan`

 Regards,
 Anton

 On Thu, Jun 20, 2019 at 4:12 PM Anton Kedin  wrote:

> Published the snapshots, working through the verify_release_validation
> script
>
> Got another blocker to be cherry-picked when merged:
> https://issues.apache.org/jira/browse/BEAM-7603
>
> Regards,
> Anton
>
>
> On Wed, Jun 19, 2019 at 4:17 PM Anton Kedin  wrote:
>
>> I have cut the release branch for 2.14.0 and working through the
>> release process. Next step is building the snapshot and release branch
>> verification.
>>
>> There are two issues [1] that are still not resolved that are marked
>> as blockers at the moment:
>>  * [2] BEAM-7478 - remote cluster submission from Flink Runner broken;
>>  * [3] BEAM-7424 - retries for GCS;
>>
>> [1]
>> https://issues.apache.org/jira/browse/BEAM-7478?jql=project%20%3D%20BEAM%20%20AND%20fixVersion%20%3D%202.14.0%20AND%20status%20!%3D%20Closed%20AND%20status%20!%3DResolved
>> [2] https://issues.apache.org/jira/browse/BEAM-7478
>> [3] https://issues.apache.org/jira/browse/BEAM-7424
>>
>> Regards,
>> Anton
>>
>


Re: precommits failing on git clean:

2019-07-19 Thread Robert Burke
First time contributor Zach might have a solution in this PR, but it seems
like it would need care since it's pretty broad.

https://github.com/apache/beam/pull/9096

On Fri, Jul 19, 2019, 11:53 AM Udi Meiri  wrote:

> https://issues.apache.org/jira/browse/BEAM-7788
>


precommits failing on git clean:

2019-07-19 Thread Udi Meiri
https://issues.apache.org/jira/browse/BEAM-7788


smime.p7s
Description: S/MIME Cryptographic Signature


How to run DynamoDBIOTest?

2019-07-19 Thread Anton Kedin
Hi dev@,

Does anyone know if there's anything extra needed to run `DynamoDBIOTest`?
If I do `./graldew :sdks:java:io:amazon-web-services:build --debug` it
passes few tests during `:test` but then seems to sit on `DynamoDBIOTest`
forever. No errors, last meaningful log is `INFO: Container
localstack/localstack:0.8.6 started`. Happens on different machines, both
on master and release-2.14.0 branches.

Any pointers?

Regards,
Anton


Re: Live fixing of a Beam bug on July 25 at 3:30pm-4:30pm PST

2019-07-19 Thread Tim Sell
+1, I'd love to see this as a recording. Will you stick it up on youtube
afterwards?

On Thu, Jul 18, 2019 at 4:00 AM sridhar inuog 
wrote:

> Thanks, Pablo! Looking forward to it! Hopefully, it will also be recorded
> as well.
>
> On Wed, Jul 17, 2019 at 2:50 PM Pablo Estrada  wrote:
>
>> Yes! So I will be working on a small feature request for Java's
>> BigQueryIO: https://issues.apache.org/jira/browse/BEAM-7607
>>
>> Maybe I'll do something for Python next month. : )
>> Best
>> -P.
>>
>> On Wed, Jul 17, 2019 at 12:32 PM Rakesh Kumar 
>> wrote:
>>
>>> +1, I really appreciate this initiative. It would be really helpful
>>> newbies like me.
>>>
>>> Is it possible to list out what are the things that you are planning to
>>> cover?
>>>
>>>
>>>
>>>
>>> On Tue, Jul 16, 2019 at 11:19 AM Yichi Zhang  wrote:
>>>
 Thanks for organizing this Pablo, it'll be very helpful!

 On Tue, Jul 16, 2019 at 10:57 AM Pablo Estrada 
 wrote:

> Hello all,
> I'll be having a session where I live-fix a Beam bug for 1 hour next
> week. Everyone is invited.
>
> It will be on July 25, between 3:30pm and 4:30pm PST. Hopefully I will
> finish a full change in that time frame, but we'll see.
>
> I have not yet decided if I will do this via hangouts, or via a
> youtube livestream. In any case, I will share the link here in the next 
> few
> days.
>
> I will most likely work on the Java SDK (I have a little feature
> request in mind).
>
> Thanks!
> -P.
>



Sort Merge Bucket - Action Items

2019-07-19 Thread Neville Li
Forking this thread to discuss action items regarding the change. We can
keep technical discussion in the original thread.

Background: our SMB POC showed promising performance & cost saving
improvements and we'd like to adopt it for production soon (by EOY). We
want to contribute it to Beam so it's better generalized and maintained. We
also want to avoid divergence between our internal version and the PR while
it's in progress, specifically any breaking change in the produced SMB data.

To achieve that I'd like to propose a few action items.

   1. Reach a consensus about bucket and shard strategy, key handling,
   bucket file and metadata format, etc., anything that affect produced SMB
   data.
   2. Revise the existing PR according to #1
   3. Reduce duplicate file IO logic by reusing FileIO.Sink, Compression,
   etc., but keep the existing file level abstraction
   4. (Optional) Merge code into extensions::smb but mark clearly
   as @experimental
   5. Incorporate ideas from the discussion, e.g. ShardingFn,
   GroupByKeyAndSortValues, FileIO generalization, key URN, etc.

#1-4 gives us something usable in the short term, while #1 guarantees that
production data produced today are usable when #5 lands on master. #4 also
gives early adopters a chance to give feedback.
Due to the scope of #5, it might take much longer and a couple of big PRs
to achieve, which we can keep iterating on.

What are your thoughts on this?

On Thu, Jul 18, 2019 at 5:32 AM Robert Bradshaw  wrote:

> On Wed, Jul 17, 2019 at 9:12 PM Gleb Kanterov  wrote:
> >>
> >> Suppose one assigns a sharding function to a PCollection. Is it lazy,
> >> or does it induce a reshuffle right at that point? In either case,
> >> once the ShardingFn has been applied, how long does it remain in
> >> effect? Does it prohibit the runner (or user) from doing subsequent
> >> resharding (including dynamic load balancing)? What happens when one
> >> has a DoFn that changes the value? (Including the DoFns in our sinks
> >> that assign random keys.)
> >
> >
> > What if we would reason about sharding in the same way as we reason
> about timestamps?
> >
> > Please correct me if I am wrong, as I know, in Beam, timestamps exist
> for each element. You can get timestamp by using Reify.timestamps. If there
> are timestamped values, and they go through ParDo, timestamps are preserved.
>
> That is correct.
>
> > We can think of the same with sharding, where Reify.shards would be
> PTransform, ShardedValue> and ShardedValue would
> contain shard and a grouping key.
>
> Meaning the shard that the PCollection is currently sharded by, or the
> one that it should be sharded by in the future. (Your use case is a
> bit strange in that a single key may be spread across multiple shards,
> as long as they're part of the same "bucket.")
>
> > ParDo wouldn't change sharding and would propagate ShardingFn.
>
> The ShardingFn may not be applicable to downstream (mutated) elements.
>
> FYI, internally this is handled by having annotations on DoFns as
> being key-preserving, and only reasoning about operations separated by
> such DoFns.
>
> > CoGroupByKey on such PTransforms would reify grouping key, and do
> regular CoGroupByKey, or be rewritten to a regular ParDo if sharding of
> inputs is compatible.
> >
> > As you mentioned, it requires dynamic work rebalancing to preserve
> sharding. What if we do dynamic work rebalancing for each shard
> independently, as, I guess, it's done today for fixed windows.
>
> Currently, the unit of colocation is by key. Generally sharding
> introduces a notion of colocation where multiple keys (or mulitple
> elements, I suppose it need not be keyed) are promised to be processed
> by the same machine. This is both to constraining (wrt dynamic
> reshrading) and not needed (with respect to SMB, as your "colocation"
> is per bucket, but buckets themselves can be processed in a
> distributed manner).
>
> > When we do a split, we would split one shard into two. It should be
> possible to do consistently if values within buckets are sorted, in this
> case, we would split ranges of possible values.
>
> I'm not quite following here. Suppose one processes element a, m, and
> z. Then one decides to split the bundle, but there's not a "range" we
> can pick for the "other" as this bundle already spans the whole range.
> But maybe I'm just off in the weeds here.
>
> > On Wed, Jul 17, 2019 at 6:37 PM Robert Bradshaw 
> wrote:
> >>
> >> On Wed, Jul 17, 2019 at 4:26 PM Gleb Kanterov  wrote:
> >> >
> >> > I find there is an interesting point in the comments brought by Ahmed
> Eleryan. Similar to WindowFn, having a concept of ShardingFn, that enables
> users to implement a class for sharding data. Each Beam node can have
> ShardingFn set, similar to WindowFn (or WindowingStrategy). Sinks and
> sources are aware of that and preserve this information. Using that it's
> possible to do optimization on Beam graph, removing redundant CoGroupByKey,
> and it would be 

Re: [Off for 3 weeks]

2019-07-19 Thread Kenneth Knowles
See you! Thanks for letting us know. I hope it is a nice break.

On Fri, Jul 19, 2019 at 2:45 AM Etienne Chauchot 
wrote:

> Hi guys,
>
> Just to let you know, I'll be off for 3 weeks starting tonight.
>
> See you when I get back
>
> Etienne
>


Re: Phrase triggering jobs problem

2019-07-19 Thread Michał Walenia
@Udi Meiri 
I did some research and reached out to Scott. It seems that the new plugin
doesn't support phrase triggering and Jenkins Job DSL that we use to create
jobs doesn't support it either.
I don't see a simple way to overcome problems related to the deactivation
of ghprb plugin.
I'm going to be unavailable for the next week, so if anyone else wants to
take the JIRA issue, feel free to do it.
Have a good weekend

Michal

On Fri, Jul 12, 2019 at 5:11 PM Michał Walenia 
wrote:

> Thanks for the heads up, I'll get in touch with him so that I don't
> duplicate the research.
>
>
> On Fri, Jul 12, 2019 at 3:55 PM Lukasz Cwik  wrote:
>
>> I believe Scott Wegner investigated the new plugin (about 10 months ago)
>> because it seemed like it could filter out running tests based upon paths
>> but it was lacking some other feature(s) that the old plugin had that we
>> were already using.
>>
>> On Fri, Jul 12, 2019 at 4:52 AM Katarzyna Kucharczyk <
>> ka.kucharc...@gmail.com> wrote:
>>
>>> Just for knowledge sharing purpose, here is a link to conversation
>>> 
>>> about the new plugin.
>>>
>>> Kasia
>>>
>>> On Fri, Jul 12, 2019 at 10:47 AM Michał Walenia <
>>> michal.wale...@polidea.com> wrote:
>>>
 Hi,
 I think I'd like to take a look at it. I'll assign the issue to myself
 and I'll keep you posted on my findings.

 Have a good day

 Michal

 On Thu, Jul 11, 2019 at 8:10 PM Udi Meiri  wrote:

> Opened https://issues.apache.org/jira/browse/BEAM-7725 for migration
> off the old plugin onto the new (already deprecated I might add) plugin.
> Any takers?
>
> On Thu, Jul 11, 2019 at 10:53 AM Udi Meiri  wrote:
>
>> Okay, phrase triggering is working again (they re-enabled the
>> plugin). See notes in bug for details.
>>
>> On Thu, Jul 11, 2019 at 10:04 AM Udi Meiri  wrote:
>>
>>> I've opened a bug: https://issues.apache.org/jira/browse/BEAM-7723
>>> If anyone is working on this please assign yourself
>>>
>>> On Wed, Jul 10, 2019 at 5:57 PM Udi Meiri  wrote:
>>>
 Thanks Kenn.

 On Wed, Jul 10, 2019 at 3:31 PM Kenneth Knowles 
 wrote:

> Just noticed this thread. Infra turned off one of the GitHub
> plugins - the one we use. I forwarded the announcement. I'll see if 
> we can
> get it back on for a bit so we can migrate off. I'm not sure if they 
> have
> identical job DSL or not.
>
> On Wed, Jul 10, 2019 at 12:32 PM Udi Meiri 
> wrote:
>
>> Still happening for me too.
>>
>> On Wed, Jul 10, 2019 at 10:40 AM Lukasz Cwik 
>> wrote:
>>
>>> This has happened in the past. Usually there is some issue where
>>> Jenkins isn't notified of new PRs by Github or doesn't see the PR 
>>> phrases
>>> and hence Jenkins sits around idle. This is usually fixed after a 
>>> few hours
>>> without any action on our part.
>>>
>>> On Wed, Jul 10, 2019 at 10:28 AM Katarzyna Kucharczyk <
>>> ka.kucharc...@gmail.com> wrote:
>>>
 Hi all,

 Hope it's not duplicate but I can't find if any issue with
 phrase triggering in Jenkins was already here.
 Currently, I started third PR and no test were triggered there.
 I tried to trigger some tests manually, but with no effect.

 Am I missing something?

 Here are links to my problematic PRs:
 https://github.com/apache/beam/pull/9033
 https://github.com/apache/beam/pull/9034
 https://github.com/apache/beam/pull/9035

 Thanks,
 Kasia

>>>

 --

 Michał Walenia
 Polidea  | Software Engineer

 M: +48 791 432 002 <+48791432002>
 E: michal.wale...@polidea.com

 Unique Tech
 Check out our projects! 

>>>
>
> --
>
> Michał Walenia
> Polidea  | Software Engineer
>
> M: +48 791 432 002 <+48791432002>
> E: michal.wale...@polidea.com
>
> Unique Tech
> Check out our projects! 
>


-- 

Michał Walenia
Polidea  | Software Engineer

M: +48 791 432 002 <+48791432002>
E: michal.wale...@polidea.com

Unique Tech
Check out our projects! 


[Off for 3 weeks]

2019-07-19 Thread Etienne Chauchot
Hi guys,

Just to let you know, I'll be off for 3 weeks starting tonight.

See you when I get back

Etienne