Re: Ordering of element timestamp change and window function

2020-01-16 Thread Kenneth Knowles
On Thu, Jan 16, 2020 at 11:38 AM Robert Bradshaw 
wrote:

> On Thu, Jan 16, 2020 at 11:00 AM Kenneth Knowles  wrote:
> >
> > IIRC in Java it is forbidden to output an element with a timestamp
> outside its current window.
>
> I don't think this is checked anywhere. (Not sure how you would check
> it, as there's not generic window containment function--I suppose you
> could check if it's past the end of the window (and of course skew
> limits how far you can go back). I suppose you could try re-windowing
> and then fail if it didn't agree with what was already there.
>

I think you are right. This is governed by how a runner invoked utilities
from runners-core (output ultimately reaches this point without validation:
https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java#L258
)


> > An exception is outputs from @FinishBundle, where the output timestamp
> is required and the window is applied. TBH it seems more of an artifact of
> a mismatch between the pre-windowing and post-windowing worlds.
>
> Elements are always in some window, even if just the global window.
>

I mean that the existence of a window-unaware @FinishBundle method is an
artifact of the method existing prior to windowing as a concept. The idea
that a user can use a DoFn's local variables to buffer stuff and then
output in @FinishBundle predates the existence of windowing.

> Most of the time, mixing processing across windows is simply wrong. But
> there are fears that calling @FinishBundle once per window would be a
> performance problem. On the other hand, don't most correct implementations
> have to separate processing for each window anyhow?
>
> Processing needs to be done per window iff the result depends on the
> window or if there are side effects.
>
> > Anyhow I think the Java behavior is better, so window assignment happens
> exactly and only at window transforms.
>
> But then one ends up with timestamps that are unrelated to the windows,
> right?
>

As far as the model goes, I think windows provide an upper bound but not a
lower bound. If we take the approach that windows are a "secondary key with
a max timestamp" then the timestamps should be related to the window in the
sense that they are <= the window's max timestamp.

Kenn



> > Kenn
> >
> > On Wed, Jan 15, 2020 at 4:59 PM Ankur Goenka  wrote:
> >>
> >> The case where a plan vanilla value or a windowed value is emitted
> seems as expected as the user intent is honored without any surprises.
> >>
> >> If I understand correctly in the case when timestamp is changed then
> applying window function again can have unintended behavior in following
> cases
> >> * Custom windows: User code can be executed in unintended order.
> >> * User emit a windowed value in a previous transform: Timestamping the
> value in this case would overwrite the user assigned window in earlier step
> even when the actual timestamp is the same. Semantically, emitting an
> element or a timestamped value with the same timestamp should have the same
> behaviour.
> >>
> >> What do you think?
> >>
> >>
> >> On Wed, Jan 15, 2020 at 4:04 PM Robert Bradshaw 
> wrote:
> >>>
> >>> If an element is emitted with a timestamp, the window assignment is
> >>> re-applied at that time. At least that's how it is in Python. You can
> >>> emit the full windowed value (accepted without checking...), a
> >>> timestamped value (in which case the window will be computed), or a
> >>> plain old element (in which case the window and timestamp will be
> >>> computed (really, propagated)).
> >>>
> >>> On Wed, Jan 15, 2020 at 3:51 PM Ankur Goenka 
> wrote:
> >>> >
> >>> > Yup, This might result in unintended behavior as timestamp is
> changed after the window assignment as elements in windows do not have
> timestamp in the window time range.
> >>> >
> >>> > Shall we start validating atleast one window assignment between
> timestamp assignment and GBK/triggers to avoid unintended behaviors
> mentioned above?
> >>> >
> >>> > On Wed, Jan 15, 2020 at 1:24 PM Luke Cwik  wrote:
> >>> >>
> >>> >> Window assignment happens at the point in the pipeline the
> WindowInto transform was applied. So in this case the window would have
> been assigned using the original timestamp.
> >>> >>
> >>> >> Grouping is by key and window.
> >>> >>
> >>> >> On Tue, Jan 14, 2020 at 7:30 PM Ankur Goenka 
> wrote:
> >>> >>>
> >>> >>> Hi,
> >>> >>>
> >>> >>> I am not sure about the effect of the order of element timestamp
> change and window association has on a group by key.
> >>> >>> More specifically, what would be the behavior if we apply window
> -> change element timestamp -> Group By key.
> >>> >>> I think we should always apply window function after changing the
> timestamp of elements. Though this is neither checked nor a recommended
> practice in Beam.
> >>> >>>
> >>> >>> Example pipeline would look like this:
> >>> >>>
> >>> >>>   def applyTimestamp(value):
> >>> >>> 

Re: Please add boyuanzz as an owner/container of apache-beam in PyPi

2020-01-16 Thread Kenneth Knowles
Done

On Thu, Jan 16, 2020 at 5:59 PM Boyuan Zhang  wrote:

> Hey,
>
> I'm Boyuan, currently working on beam 2.19.0 release. Can anyone add me as
> a owner of apache-beam package in PyPi for later pushing artifacts? My id
> is boyuanzz.
>
> Thanks for your help!
>


Re: Jenkins jobs not running for my PR 10438

2020-01-16 Thread Tomo Suzuki
Hi Beam Committers,
(Andrew, thanks! but I needed to fix tests)

I appreciate if somebody can re-trigger precommit checks for
https://github.com/apache/beam/pull/10614 with the following
additional checks:

Run Java PostCommit
Run Java HadoopFormatIO Performance Test
Run BigQueryIO Streaming Performance Test Java
Run Dataflow ValidatesRunner
Run Spark ValidatesRunner
Run SQL Postcommit

On Thu, Jan 16, 2020 at 4:11 PM Andrew Pilloud  wrote:
>
> done.
>
> On Thu, Jan 16, 2020 at 1:07 PM Tomo Suzuki  wrote:
>>
>> Hi Beam committers,
>>
>> I appreciate if somebody can trigger precommit checks for 
>> https://github.com/apache/beam/pull/10614 with the following additional 
>> checks:
>>
>> Run Java PostCommit
>> Run Java HadoopFormatIO Performance Test
>> Run BigQueryIO Streaming Performance Test Java
>> Run Dataflow ValidatesRunner
>> Run Spark ValidatesRunner
>> Run SQL Postcommit
>>
>> Regards,
>> Tomo
>>


-- 
Regards,
Tomo


Please add boyuanzz as an owner/container of apache-beam in PyPi

2020-01-16 Thread Boyuan Zhang
Hey,

I'm Boyuan, currently working on beam 2.19.0 release. Can anyone add me as
a owner of apache-beam package in PyPi for later pushing artifacts? My id
is boyuanzz.

Thanks for your help!


RE: Re: [RESULT] [VOTE] Beam's Mascot will be the Firefly (Lampyridae)

2020-01-16 Thread Julian Bruno
Hey Beam Team,

Thanks for your support around this! I will be be submitting an individual
contributors license agreement.


http://www.apache.org/licenses/contributor-agreements.html

Cheers!
Julian




On 2020/01/16 19:17:24 Aizhamal Nurmamat kyzy wrote: > I was going to let
Julian answer as he is following this thread, but yes, > the design will
have appropriate licences so we can use and reuse and > modify it in the
future. Julian also expressed willingness to stay active > in the community
to contribute more varieties of the mascot as we need :) > > On Thu, Jan
16, 2020 at 8:52 AM Kenneth Knowles wrote: > > > *correction: ASF does not
require transfer of copyright, only appropriate > > license > > > > On Thu,
Jan 16, 2020 at 8:45 AM Kenneth Knowles wrote: > > > >> Good question. IMO
it is a very good thing to have fun as with the > >> variety of uses of the
Go language mascot. > >> > >> Note that copyright and trademark should be
clearly separated in this > >> discussion. These both govern "everyone can
draw and adapt". > >> > >> Copyright: contributed images owned by ASF,
licensed ASL2. You can use > >> and create derivative works. > >>
Trademark: a mark owned by ASF, protected by ASF and Beam PMC. See > >>
http://www.apache.org/foundation/marks/ and particularly "nominative > >>
use". > >> > >> Kenn > >> > >> On Tue, Jan 14, 2020 at 1:43 PM Alex Van
Boxel wrote: > >> > >>> I hope for the mascot will be simple enough so
everyone can draw it and > >>> adapt. The mascot will be license free
right... so you don't need to pay > >>> the graphic artist for every use of
the mascot? > >>> > >>> _/ > >>> _/ Alex Van Boxel > >>> > >>> > >>> On
Tue, Jan 14, 2020 at 8:26 PM Aizhamal Nurmamat kyzy < > >>>
aizha...@apache.org> wrote: > >>> >  Thanks Kenn for running the vote!
>  >  I had reached out to a couple designers that I know
personally and few >  in the community to see whether they were willing
to contribute the >  designs. >  >  Julian was one of them, who
agreed to work with us for a reasonable fee >  which can be donated by
Google. Julian is a very talented visual artist and >  creates really
cool animations too (if we want our Firefly to fly). >  >  Here is
more about Julian’s work: >  >  2D reel :
https://youtu.be/2miCzKbuook >  >  linkedin:
www.linkedin.com/in/julianbruno >  >  artstation:
www.artstation.com/jbruno >  >  If you all agree to work with him,
I will start the process. Here is >  how it is going to look like: >
 >  >  1. >  >  Julian will be sending us a series of
sketches of the firefly in >  the dev@ list, iterating on the version
that we like the most >  2. >  >  If the sketches meet the
community’s expectations, he will continue >  polishing the final
design >  3. >  >  Once the design has been approved, he will
give the final touches >  to it and send us raw files containing the
character on whichever file >  format we want >  >  >  What
do you all think? >  >  >  On Fri, Jan 3, 2020 at 9:33 PM
Kenneth Knowles wrote: >  > > I am happy to announce that this vote
has passed, with 20 approving +1 > > votes, 5 of which are binding PMC
votes. > > > > Beam's Mascot is the Firefly! > > > > Kenn >
> > > On Fri, Jan 3, 2020 at 9:31 PM Kenneth Knowles > > wrote:
> > > >> +1 (binding) > >> > >> On Tue, Dec 17, 2019 at
12:30 PM Leonardo Miguel < > >> leonardo.mig...@arquivei.com.br> wrote:
> >> > >>> +1 > >>> > >>> Em sex., 13 de dez. de 2019 às
01:58, Kenneth Knowles < > >>> k...@apache.org> escreveu: > >>> >
 Please vote on the proposal for Beam's mascot to be the Firefly. >
 This encompasses the Lampyridae family of insects, without
specifying a >  genus or species. >  >  [ ] +1,
Approve Firefly being the mascot >  [ ] -1, Disapprove Firefly
being the mascot >  >  The vote will be open for at least
72 hours excluding weekends. It >  is adopted by at least 3 PMC +1
approval votes, with no PMC -1 disapproval >  votes*. Non-PMC votes
are still encouraged. >  >  PMC voters, please help by
indicating your vote as "(binding)" >  >  Kenn >  >
 *I have chosen this format for this vote, even though Beam uses >
 simple majority as a rule, because I want any PMC member to be
able to veto >  based on concerns about overlap or trademark. >
 > >>> > >>> > >>> -- > >>> []s > >>> > >>>
Leonardo Alves Miguel > >>> Data Engineer > >>> (16) 3509-5515 |
www.arquivei.com.br > >>>


Re: Beam's Avro 1.8.x dependency

2020-01-16 Thread Kenneth Knowles
Regarding Google's advice about shading: don't go to a "one version rule"
monorepo for advice about solving diamond dependencies in the wild.

It is a useful description of the pitfalls. We (and Flink before us, and
likely many more) are already doing something that avoids many of them or
makes them less likely. Building a separate vendored library is simpler and
more robust than the "shade during build".

Ismaël's point #2 is important: we can't shade or vendor Avro if we intend
to use it with user-generated code. Generated code with external
dependencies requires coordination with the vendoring, as we do for
portability + gRPC. The ProtoCoder uses non-vendored proto for this reason.
The one totally internal use of Avro I am aware of is BigQueryIO. This
could perhaps use a vendored Avro. But OTOH it is already in an isolated
module so it is less severe.

And as many have pointed out, upgrading a dep across a breaking change is a
breaking change. "Stop depending on Avro" is a breaking change as well. So
if we are going to do that, moving it out of core is a more valuable
breaking change.

But perhaps highlighting Gleb's comment: we can build a separate
library/artifacts for providing an AvroCoder that uses Avro 1.9.x (and
potentially make a separate one for 1.8.x and encourage users to use that).
We might be able to make Avro 1.8.x optional for the core SDK, finding a
way for a user to pin to 1.9 as long as they don't touch the parts of the
SDK that use 1.8.

Kenn

On Thu, Jan 16, 2020 at 1:49 PM Aaron Dixon  wrote:

> Looks like there's some strategy to get to the right solution here and
> that it may likely involve breaking compatibility.
>
> One option for myself would be to strip the Beam JAR of AvroCoder and
> combine with the old AvroCoder from Beam 2.16 -- this would allow me to
> upgrade Beam but of course is rather hacky.
>
> On second thought, was the breaking change from Beam 2.16->2.17 really
> necessary? If not, could AvroCoder be restored to a 1.9.x "compatible"
> implementation and kept this way for the Beam 2.1x version lineage?
>
> This seems like a somewhat fair ask given the way that I'm suddenly
> blocked --- however I do realize this is somewhat of a technicality; ie,
> Beam 2.16-'s compatibility with my usage of Avro 1.9.x was incidental.
>
> But, still, if the changes to AvroCoder weren't necessary, restoring back
> would unblock me and anyone else using Avro 1.9.x (surely I'm not the only
> one!?)
>
>
> On Thu, Jan 16, 2020 at 12:22 PM Elliotte Rusty Harold 
> wrote:
>
>> Avro does not follow semver. They update the major version when the
>> serialization format changes and the minor version when the API
>> changes in a backwards incompatible way. See
>> https://issues.apache.org/jira/browse/AVRO-2687
>>
>> On Thu, Jan 16, 2020 at 12:50 PM Luke Cwik  wrote:
>> >
>> > Does avro not follow semantic versioning and upgrading to 1.9 should
>> have been backwards compatible or does our usage reach into the internals
>> of avro?
>> >
>> > On Thu, Jan 16, 2020 at 6:16 AM Ismaël Mejía  wrote:
>> >>
>> >> I forgot to explain why the most obvious path (just upgrade Avro to
>> version
>> >> 1.9.x) is not a valid long term solution. Other systems Beam runs on
>> top of
>> >> (e.g.  Spark!) also leak Avro into their core so in the moment Spark
>> moves up
>> >> to Avro 1.9.x Spark runner users will be in a really fragile position
>> where
>> >> things will work until they don't (similar to Aaron's case) so a
>> stronger reason
>> >> to getAvro out of Beam core.
>> >>
>> >>
>> >> On Thu, Jan 16, 2020 at 1:59 PM Elliotte Rusty Harold <
>> elh...@ibiblio.org> wrote:
>> >>>
>> >>> Shading should be a last resort:
>> >>>
>> >>> https://jlbp.dev/JLBP-18.html
>> >>>
>> >>> It tends to cause more problems than it solves. At best it's a stopgap
>> >>> measure when you don't have the resources to fix the real problem. In
>> >>> this case it sounds like the real issue is that AVRO is not stable.
>> >>> There are at least three other solutions in a case like this:
>> >>>
>> >>> 1. Fix Avro at the root.
>> >>> 2. Fork Avro and then fix it.
>> >>> 3. Stop depending on Avro.
>> >>>
>> >>> None of these are trivial which is why shading gets considered.
>> >>> However shading doesn't fix the underlying problems, and ultimately
>> >>> makes a product as unreliable as its least reliable dependency. :-(
>> >>>
>> >>> On Thu, Jan 16, 2020 at 2:01 AM jincheng sun <
>> sunjincheng...@gmail.com> wrote:
>> >>> >
>> >>> > I found that there are several dependencies shaded and planned to
>> made as vendored artifacts in [1]. I'm not sure why Avro is not shaded
>> before. From my point of view, it's a good idea to shade Avro and make it a
>> vendored artifact if there are no special reasons blocking us to do that.
>> Regarding to how to create a vendored artifact, you can refer to [2] for
>> more details.
>> >>> >
>> >>> > Best,
>> >>> > Jincheng
>> >>> >
>> >>> > [1] https://issues.apache.org/jira/browse/BEAM-5819
>> >>> 

Re: Beam's Avro 1.8.x dependency

2020-01-16 Thread Aaron Dixon
Looks like there's some strategy to get to the right solution here and that
it may likely involve breaking compatibility.

One option for myself would be to strip the Beam JAR of AvroCoder and
combine with the old AvroCoder from Beam 2.16 -- this would allow me to
upgrade Beam but of course is rather hacky.

On second thought, was the breaking change from Beam 2.16->2.17 really
necessary? If not, could AvroCoder be restored to a 1.9.x "compatible"
implementation and kept this way for the Beam 2.1x version lineage?

This seems like a somewhat fair ask given the way that I'm suddenly blocked
--- however I do realize this is somewhat of a technicality; ie, Beam
2.16-'s compatibility with my usage of Avro 1.9.x was incidental.

But, still, if the changes to AvroCoder weren't necessary, restoring back
would unblock me and anyone else using Avro 1.9.x (surely I'm not the only
one!?)


On Thu, Jan 16, 2020 at 12:22 PM Elliotte Rusty Harold 
wrote:

> Avro does not follow semver. They update the major version when the
> serialization format changes and the minor version when the API
> changes in a backwards incompatible way. See
> https://issues.apache.org/jira/browse/AVRO-2687
>
> On Thu, Jan 16, 2020 at 12:50 PM Luke Cwik  wrote:
> >
> > Does avro not follow semantic versioning and upgrading to 1.9 should
> have been backwards compatible or does our usage reach into the internals
> of avro?
> >
> > On Thu, Jan 16, 2020 at 6:16 AM Ismaël Mejía  wrote:
> >>
> >> I forgot to explain why the most obvious path (just upgrade Avro to
> version
> >> 1.9.x) is not a valid long term solution. Other systems Beam runs on
> top of
> >> (e.g.  Spark!) also leak Avro into their core so in the moment Spark
> moves up
> >> to Avro 1.9.x Spark runner users will be in a really fragile position
> where
> >> things will work until they don't (similar to Aaron's case) so a
> stronger reason
> >> to getAvro out of Beam core.
> >>
> >>
> >> On Thu, Jan 16, 2020 at 1:59 PM Elliotte Rusty Harold <
> elh...@ibiblio.org> wrote:
> >>>
> >>> Shading should be a last resort:
> >>>
> >>> https://jlbp.dev/JLBP-18.html
> >>>
> >>> It tends to cause more problems than it solves. At best it's a stopgap
> >>> measure when you don't have the resources to fix the real problem. In
> >>> this case it sounds like the real issue is that AVRO is not stable.
> >>> There are at least three other solutions in a case like this:
> >>>
> >>> 1. Fix Avro at the root.
> >>> 2. Fork Avro and then fix it.
> >>> 3. Stop depending on Avro.
> >>>
> >>> None of these are trivial which is why shading gets considered.
> >>> However shading doesn't fix the underlying problems, and ultimately
> >>> makes a product as unreliable as its least reliable dependency. :-(
> >>>
> >>> On Thu, Jan 16, 2020 at 2:01 AM jincheng sun 
> wrote:
> >>> >
> >>> > I found that there are several dependencies shaded and planned to
> made as vendored artifacts in [1]. I'm not sure why Avro is not shaded
> before. From my point of view, it's a good idea to shade Avro and make it a
> vendored artifact if there are no special reasons blocking us to do that.
> Regarding to how to create a vendored artifact, you can refer to [2] for
> more details.
> >>> >
> >>> > Best,
> >>> > Jincheng
> >>> >
> >>> > [1] https://issues.apache.org/jira/browse/BEAM-5819
> >>> > [2] https://github.com/apache/beam/blob/master/vendor/README.md
> >>> >
> >>> >
> >>> > Tomo Suzuki  于2020年1月16日周四 下午1:18写道:
> >>> >>
> >>> >> I've been upgrading dependencies around gRPC. This Avro-problem is
> >>> >> interesting to me.
> >>> >> I'll study BEAM-8388 more tomorrow.
> >>> >>
> >>> >> On Wed, Jan 15, 2020 at 10:51 PM Luke Cwik 
> wrote:
> >>> >> >
> >>> >> > +Tomo Suzuki +jincheng sun
> >>> >> > There have been a few contributors upgrading the dependencies and
> validating things not breaking by running the majority of the post commit
> integration tests and also using the linkage checker to show that we aren't
> worse off with respect to our dependency tree. Reaching out to them to help
> your is your best bet of getting these upgrades through.
> >>> >> >
> >>> >> > On Wed, Jan 15, 2020 at 6:52 PM Aaron Dixon 
> wrote:
> >>> >> >>
> >>> >> >> I meant to mention that we must use Avro 1.9.x as we rely on
> some schema resolution fixes not present in 1.8.x - so am indeed blocked.
> >>> >> >>
> >>> >> >> On Wed, Jan 15, 2020 at 8:50 PM Aaron Dixon 
> wrote:
> >>> >> >>>
> >>> >> >>> It looks like Avro version dependency from Beam has come up in
> the past [1, 2].
> >>> >> >>>
> >>> >> >>> I'm currently on Beam 2.16.0, which has been compatible with my
> usage of Avro 1.9.x.
> >>> >> >>>
> >>> >> >>> But upgrading to Beam 2.17.0 is not possible for us now that
> 2.17.0 has some dependencies on Avro classes only available in 1.8.x.
> >>> >> >>>
> >>> >> >>> Wondering if anyone else is similar blocked and what it would
> take to prioritize Beam upgrading to 1.9.x or better using a shaded version
> so that clients can use their own 

Re: Jenkins jobs not running for my PR 10438

2020-01-16 Thread Andrew Pilloud
done.

On Thu, Jan 16, 2020 at 1:07 PM Tomo Suzuki  wrote:

> Hi Beam committers,
>
> I appreciate if somebody can trigger precommit checks for
> https://github.com/apache/beam/pull/10614 with the following additional
> checks:
>
> Run Java PostCommit
> Run Java HadoopFormatIO Performance Test
> Run BigQueryIO Streaming Performance Test Java
> Run Dataflow ValidatesRunner
> Run Spark ValidatesRunner
> Run SQL Postcommit
>
> Regards,
> Tomo
>
>


Re: Jenkins jobs not running for my PR 10438

2020-01-16 Thread Tomo Suzuki
Hi Beam committers,

I appreciate if somebody can trigger precommit checks for
https://github.com/apache/beam/pull/10614 with the following additional
checks:

Run Java PostCommit
Run Java HadoopFormatIO Performance Test
Run BigQueryIO Streaming Performance Test Java
Run Dataflow ValidatesRunner
Run Spark ValidatesRunner
Run SQL Postcommit

Regards,
Tomo


Re: Apache community contact point

2020-01-16 Thread Hannah Jiang
Here is the Jira ticket I created:
https://issues.apache.org/jira/browse/INFRA-19732
Kenn, I tried to add you as a watcher, but was not able to do so because of
lack of permission. Thanks for your willingness to help :)

On Thu, Jan 16, 2020 at 10:47 AM Kenneth Knowles  wrote:

> They may require a PMC member, so feel free to add me or another. But a
> link to your thread should probably be enough.
>
> On Wed, Jan 15, 2020 at 1:39 PM Hannah Jiang 
> wrote:
>
>> Thanks Andrew, I will try with Jira.
>>
>> On Wed, Jan 15, 2020 at 1:13 PM Andrew Pilloud 
>> wrote:
>>
>>> I'm not sure you have the right contact point. Have you tried filing a
>>> JIRA ticket with the INFRA project and Docker component? JIRA is
>>> usually the best way to get changes made to Apache infrastructure.
>>>
>>> Andrew
>>>
>>> On Wed, Jan 15, 2020 at 1:03 PM Hannah Jiang 
>>> wrote:
>>>
 I am trying to contact the Apache community to deploy Beam images to
 their organization at docker hub. I wrote an email to 
 *d...@community.apache.org
 * and it has been almost 48 hours, but
 haven't received any response.

 To the people who have experience working with them, is this a correct
 contact point? Are there any advice I can follow?

 Thanks,
 Hannah

>>>


Re: Ordering of element timestamp change and window function

2020-01-16 Thread Robert Bradshaw
On Thu, Jan 16, 2020 at 11:00 AM Kenneth Knowles  wrote:
>
> IIRC in Java it is forbidden to output an element with a timestamp outside 
> its current window.

I don't think this is checked anywhere. (Not sure how you would check
it, as there's not generic window containment function--I suppose you
could check if it's past the end of the window (and of course skew
limits how far you can go back). I suppose you could try re-windowing
and then fail if it didn't agree with what was already there.

> An exception is outputs from @FinishBundle, where the output timestamp is 
> required and the window is applied. TBH it seems more of an artifact of a 
> mismatch between the pre-windowing and post-windowing worlds.

Elements are always in some window, even if just the global window.

> Most of the time, mixing processing across windows is simply wrong. But there 
> are fears that calling @FinishBundle once per window would be a performance 
> problem. On the other hand, don't most correct implementations have to 
> separate processing for each window anyhow?

Processing needs to be done per window iff the result depends on the
window or if there are side effects.

> Anyhow I think the Java behavior is better, so window assignment happens 
> exactly and only at window transforms.

But then one ends up with timestamps that are unrelated to the windows, right?

> Kenn
>
> On Wed, Jan 15, 2020 at 4:59 PM Ankur Goenka  wrote:
>>
>> The case where a plan vanilla value or a windowed value is emitted seems as 
>> expected as the user intent is honored without any surprises.
>>
>> If I understand correctly in the case when timestamp is changed then 
>> applying window function again can have unintended behavior in following 
>> cases
>> * Custom windows: User code can be executed in unintended order.
>> * User emit a windowed value in a previous transform: Timestamping the value 
>> in this case would overwrite the user assigned window in earlier step even 
>> when the actual timestamp is the same. Semantically, emitting an element or 
>> a timestamped value with the same timestamp should have the same behaviour.
>>
>> What do you think?
>>
>>
>> On Wed, Jan 15, 2020 at 4:04 PM Robert Bradshaw  wrote:
>>>
>>> If an element is emitted with a timestamp, the window assignment is
>>> re-applied at that time. At least that's how it is in Python. You can
>>> emit the full windowed value (accepted without checking...), a
>>> timestamped value (in which case the window will be computed), or a
>>> plain old element (in which case the window and timestamp will be
>>> computed (really, propagated)).
>>>
>>> On Wed, Jan 15, 2020 at 3:51 PM Ankur Goenka  wrote:
>>> >
>>> > Yup, This might result in unintended behavior as timestamp is changed 
>>> > after the window assignment as elements in windows do not have timestamp 
>>> > in the window time range.
>>> >
>>> > Shall we start validating atleast one window assignment between timestamp 
>>> > assignment and GBK/triggers to avoid unintended behaviors mentioned above?
>>> >
>>> > On Wed, Jan 15, 2020 at 1:24 PM Luke Cwik  wrote:
>>> >>
>>> >> Window assignment happens at the point in the pipeline the WindowInto 
>>> >> transform was applied. So in this case the window would have been 
>>> >> assigned using the original timestamp.
>>> >>
>>> >> Grouping is by key and window.
>>> >>
>>> >> On Tue, Jan 14, 2020 at 7:30 PM Ankur Goenka  wrote:
>>> >>>
>>> >>> Hi,
>>> >>>
>>> >>> I am not sure about the effect of the order of element timestamp change 
>>> >>> and window association has on a group by key.
>>> >>> More specifically, what would be the behavior if we apply window -> 
>>> >>> change element timestamp -> Group By key.
>>> >>> I think we should always apply window function after changing the 
>>> >>> timestamp of elements. Though this is neither checked nor a recommended 
>>> >>> practice in Beam.
>>> >>>
>>> >>> Example pipeline would look like this:
>>> >>>
>>> >>>   def applyTimestamp(value):
>>> >>> return window.TimestampedValue((key, value), 
>>> >>> int(time.time())
>>> >>>
>>> >>> p \
>>> >>> | 'Create' >> beam.Create(range(0, 10)) \
>>> >>> | 'Fixed Window' >> beam.WindowInto(window.FixedWindows(5)) 
>>> >>> \
>>> >>> | 'Apply Timestamp' >> beam.Map(applyTimestamp) \ # 
>>> >>> Timestamp is changed after windowing and before GBK
>>> >>> | 'Group By Key' >> beam.GroupByKey() \
>>> >>> | 'Print' >> beam.Map(print)
>>> >>>
>>> >>> Thanks,
>>> >>> Ankur


Re: [RESULT] [VOTE] Beam's Mascot will be the Firefly (Lampyridae)

2020-01-16 Thread Aizhamal Nurmamat kyzy
I was going to let Julian answer as he is following this thread, but yes,
the design will have appropriate licences so we can use and reuse and
modify it in the future. Julian also expressed willingness to stay active
in the community to contribute more varieties of the mascot as we need :)

On Thu, Jan 16, 2020 at 8:52 AM Kenneth Knowles  wrote:

> *correction: ASF does not require transfer of copyright, only appropriate
> license
>
> On Thu, Jan 16, 2020 at 8:45 AM Kenneth Knowles  wrote:
>
>> Good question. IMO it is a very good thing to have fun as with the
>> variety of uses of the Go language mascot.
>>
>> Note that copyright and trademark should be clearly separated in this
>> discussion. These both govern "everyone can draw and adapt".
>>
>> Copyright: contributed images owned by ASF, licensed ASL2. You can use
>> and create derivative works.
>> Trademark: a mark owned by ASF, protected by ASF and Beam PMC. See
>> http://www.apache.org/foundation/marks/ and particularly "nominative
>> use".
>>
>> Kenn
>>
>> On Tue, Jan 14, 2020 at 1:43 PM Alex Van Boxel  wrote:
>>
>>> I hope for the mascot will be simple enough so everyone can draw it and
>>> adapt. The mascot will be license free right... so you don't need to pay
>>> the graphic artist for every use of the mascot?
>>>
>>>  _/
>>> _/ Alex Van Boxel
>>>
>>>
>>> On Tue, Jan 14, 2020 at 8:26 PM Aizhamal Nurmamat kyzy <
>>> aizha...@apache.org> wrote:
>>>
 Thanks Kenn for running the vote!

 I had reached out to a couple designers that I know personally and few
 in the community to see whether they were willing to contribute the
 designs.

 Julian was one of them, who agreed to work with us for a reasonable fee
 which can be donated by Google. Julian is a very talented visual artist and
 creates really cool animations too (if we want our Firefly to fly).

 Here is more about Julian’s work:

 2D reel : https://youtu.be/2miCzKbuook

 linkedin: www.linkedin.com/in/julianbruno

 artstation: www.artstation.com/jbruno

 If you all agree to work with him, I will start the process. Here is
 how it is going to look like:


1.

Julian will be sending us a series of sketches of the firefly in
the dev@ list, iterating on the version that we like the most
2.

If the sketches meet the community’s expectations, he will continue
polishing the final design
3.

Once the design has been approved, he will give the final touches
to it and send us raw files containing the character on whichever file
format we want


 What do you all think?


 On Fri, Jan 3, 2020 at 9:33 PM Kenneth Knowles  wrote:

> I am happy to announce that this vote has passed, with 20 approving +1
> votes, 5 of which are binding PMC votes.
>
> Beam's Mascot is the Firefly!
>
> Kenn
>
> On Fri, Jan 3, 2020 at 9:31 PM Kenneth Knowles 
> wrote:
>
>> +1 (binding)
>>
>> On Tue, Dec 17, 2019 at 12:30 PM Leonardo Miguel <
>> leonardo.mig...@arquivei.com.br> wrote:
>>
>>> +1
>>>
>>> Em sex., 13 de dez. de 2019 às 01:58, Kenneth Knowles <
>>> k...@apache.org> escreveu:
>>>
 Please vote on the proposal for Beam's mascot to be the Firefly.
 This encompasses the Lampyridae family of insects, without specifying a
 genus or species.

 [ ] +1, Approve Firefly being the mascot
 [ ] -1, Disapprove Firefly being the mascot

 The vote will be open for at least 72 hours excluding weekends. It
 is adopted by at least 3 PMC +1 approval votes, with no PMC -1 
 disapproval
 votes*. Non-PMC votes are still encouraged.

 PMC voters, please help by indicating your vote as "(binding)"

 Kenn

 *I have chosen this format for this vote, even though Beam uses
 simple majority as a rule, because I want any PMC member to be able to 
 veto
 based on concerns about overlap or trademark.

>>>
>>>
>>> --
>>> []s
>>>
>>> Leonardo Alves Miguel
>>> Data Engineer
>>> (16) 3509-5515 | www.arquivei.com.br
>>> 
>>> [image: Arquivei.com.br – Inteligência em Notas Fiscais]
>>> 
>>> [image: Google seleciona Arquivei para imersão e mentoria no Vale do
>>> Silício]
>>> 
>>> 
>>> 
>>> 
>>>
>>


Re: Ordering of element timestamp change and window function

2020-01-16 Thread Kenneth Knowles
IIRC in Java it is forbidden to output an element with a timestamp outside
its current window. An exception is outputs from @FinishBundle, where the
output timestamp is required and the window is applied. TBH it seems more
of an artifact of a mismatch between the pre-windowing and post-windowing
worlds. Most of the time, mixing processing across windows is simply wrong.
But there are fears that calling @FinishBundle once per window would be a
performance problem. On the other hand, don't most correct implementations
have to separate processing for each window anyhow?

Anyhow I think the Java behavior is better, so window assignment happens
exactly and only at window transforms.

Kenn

On Wed, Jan 15, 2020 at 4:59 PM Ankur Goenka  wrote:

> The case where a plan vanilla value or a windowed value is emitted seems
> as expected as the user intent is honored without any surprises.
>
> If I understand correctly in the case when timestamp is changed then
> applying window function again can have unintended behavior in following
> cases
> * Custom windows: User code can be executed in unintended order.
> * User emit a windowed value in a previous transform: Timestamping the
> value in this case would overwrite the user assigned window in earlier step
> even when the actual timestamp is the same. Semantically, emitting an
> element or a timestamped value with the same timestamp should have the same
> behaviour.
>
> What do you think?
>
>
> On Wed, Jan 15, 2020 at 4:04 PM Robert Bradshaw 
> wrote:
>
>> If an element is emitted with a timestamp, the window assignment is
>> re-applied at that time. At least that's how it is in Python. You can
>> emit the full windowed value (accepted without checking...), a
>> timestamped value (in which case the window will be computed), or a
>> plain old element (in which case the window and timestamp will be
>> computed (really, propagated)).
>>
>> On Wed, Jan 15, 2020 at 3:51 PM Ankur Goenka  wrote:
>> >
>> > Yup, This might result in unintended behavior as timestamp is changed
>> after the window assignment as elements in windows do not have timestamp in
>> the window time range.
>> >
>> > Shall we start validating atleast one window assignment between
>> timestamp assignment and GBK/triggers to avoid unintended behaviors
>> mentioned above?
>> >
>> > On Wed, Jan 15, 2020 at 1:24 PM Luke Cwik  wrote:
>> >>
>> >> Window assignment happens at the point in the pipeline the WindowInto
>> transform was applied. So in this case the window would have been assigned
>> using the original timestamp.
>> >>
>> >> Grouping is by key and window.
>> >>
>> >> On Tue, Jan 14, 2020 at 7:30 PM Ankur Goenka 
>> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> I am not sure about the effect of the order of element timestamp
>> change and window association has on a group by key.
>> >>> More specifically, what would be the behavior if we apply window ->
>> change element timestamp -> Group By key.
>> >>> I think we should always apply window function after changing the
>> timestamp of elements. Though this is neither checked nor a recommended
>> practice in Beam.
>> >>>
>> >>> Example pipeline would look like this:
>> >>>
>> >>>   def applyTimestamp(value):
>> >>> return window.TimestampedValue((key, value),
>> int(time.time())
>> >>>
>> >>> p \
>> >>> | 'Create' >> beam.Create(range(0, 10)) \
>> >>> | 'Fixed Window' >>
>> beam.WindowInto(window.FixedWindows(5)) \
>> >>> | 'Apply Timestamp' >> beam.Map(applyTimestamp) \ #
>> Timestamp is changed after windowing and before GBK
>> >>> | 'Group By Key' >> beam.GroupByKey() \
>> >>> | 'Print' >> beam.Map(print)
>> >>>
>> >>> Thanks,
>> >>> Ankur
>>
>


Re: Apache community contact point

2020-01-16 Thread Kenneth Knowles
They may require a PMC member, so feel free to add me or another. But a
link to your thread should probably be enough.

On Wed, Jan 15, 2020 at 1:39 PM Hannah Jiang  wrote:

> Thanks Andrew, I will try with Jira.
>
> On Wed, Jan 15, 2020 at 1:13 PM Andrew Pilloud 
> wrote:
>
>> I'm not sure you have the right contact point. Have you tried filing a
>> JIRA ticket with the INFRA project and Docker component? JIRA is
>> usually the best way to get changes made to Apache infrastructure.
>>
>> Andrew
>>
>> On Wed, Jan 15, 2020 at 1:03 PM Hannah Jiang 
>> wrote:
>>
>>> I am trying to contact the Apache community to deploy Beam images to
>>> their organization at docker hub. I wrote an email to 
>>> *d...@community.apache.org
>>> * and it has been almost 48 hours, but
>>> haven't received any response.
>>>
>>> To the people who have experience working with them, is this a correct
>>> contact point? Are there any advice I can follow?
>>>
>>> Thanks,
>>> Hannah
>>>
>>


Re: [PROPOSAL] Transition released containers to the official ASF dockerhub organization

2020-01-16 Thread Kenneth Knowles
+1 very nice explanation

On Wed, Jan 15, 2020 at 1:57 PM Ahmet Altay  wrote:

> +1 - Thank you for driving this!
>
> On Wed, Jan 15, 2020 at 1:55 PM Thomas Weise  wrote:
>
>> +1 for the namespace proposal.
>>
>> It is similar to github repos. Top-level is the org, then single level
>> for repo (beam-abc, beam-xzy, ..)
>>
>>
>>
>> On Wed, Jan 15, 2020 at 1:45 PM Robert Bradshaw 
>> wrote:
>>
>>> Various tags of the same image should at least logically be the same
>>> thing, so I agree that we should not be trying to share a single
>>> repository in that way. A full suite of apache/beam-{image_desc}
>>> repositories, if apache is fine with that, seems like the best
>>> approach.
>>>
>>> On Wed, Jan 15, 2020 at 1:32 PM Kyle Weaver  wrote:
>>> >
>>> > +1, agree that moving current image name to tags is a non-starter.
>>> Thanks for driving this Hannah. Let us know what they say about repo
>>> creation.
>>> >
>>> > On Wed, Jan 15, 2020 at 1:16 PM Udi Meiri  wrote:
>>> >>
>>> >> SG +1
>>> >>
>>> >> On Wed, Jan 15, 2020 at 12:59 PM Hannah Jiang 
>>> wrote:
>>> >>>
>>> >>> I have done some research about images released under apache
>>> namespace at docker hub, and here is my proposal.
>>> >>>
>>> >>> Currently, we are using apachebeam as our namespace and each image
>>> has its own repository. Version number is used to tag the images.
>>> >>> ie: apachebeam/python2.7_sdk:2.19.0,
>>> apachebeam/flink1.9_job_server:2.19.0
>>> >>>
>>> >>> Now we are migrating to apache namespace and docker hub doesn't
>>> support nested repository names, so we cannot use
>>> apache/beam/{image-desc}:{version}.
>>> >>> Instead, I propose to use apache/beam-{image_desc}:{version} as our
>>> repository name.
>>> >>> ie: apache/beam-python2.7_sdk:2.19.0,
>>> apache/beam-flink1.9_job_server:2.19.0
>>> >>> => When a user searches for apache/beam at docker hub, it will list
>>> all the repositories we deployed with apache/beam-, so no concerns that
>>> some released images are missed by users.
>>> >>> => Repository names give insights to the users which repositories
>>> they should use.
>>> >>> => A downside with this approach is we need to create a new
>>> repository whenever we release a new image, time and effort needed for this
>>> is pending, I am contacting apache docker hub management team.
>>> >>>
>>> >>> I have considered using beam as repository name and moving image
>>> name and version to tags, (ie: apache/beam:python3.7_sdk_2.19.0), which
>>> means put all images to a single repository, however, this approach has
>>> some downsides.
>>> >>> => When a user searches for apache/beam, only one repository is
>>> returned. Users need to use tags to identify which images they should use.
>>> Since we release images with new tags for each version, it will overwhelm
>>> the users and give them an impression that the images are not organized
>>> well. It's also difficult to know what kind of images we deployed.
>>> >>> => With both image name and version included at tags, it is a little
>>> bit more complicated to maintain the code.
>>> >>> => There is no correct answer which image the latest tag should
>>> point to.
>>> >>>
>>> >>> Are there any concerns with this proposal?
>>> >>>
>>> >>> Thanks,
>>> >>> Hannah
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Fri, Jan 10, 2020 at 4:19 PM Ahmet Altay 
>>> wrote:
>>> 
>>> 
>>> 
>>>  On Fri, Jan 10, 2020 at 3:33 PM Ahmet Altay 
>>> wrote:
>>> >
>>> >
>>> >
>>> > On Fri, Jan 10, 2020 at 3:32 PM Ankur Goenka 
>>> wrote:
>>> >>
>>> >> Also curious to know if apache provide any infra support fro
>>> projects under Apache umbrella and any quota limits they might have.
>>> 
>>> 
>>>  Maybe Hannah can ask with an infra ticket?
>>> 
>>> >>
>>> >>
>>> >> On Fri, Jan 10, 2020, 2:26 PM Robert Bradshaw <
>>> rober...@google.com> wrote:
>>> >>>
>>> >>> One downside is that, unlike many of these projects, we release a
>>> >>> dozen or so containers. Is there exactly (and only) one level of
>>> >>> namespacing/nesting we can leverage here? (This isn't a blocker,
>>> but
>>> >>> something to consider.)
>>> >
>>> >
>>> > After a quick search, I could not find a way to use more than one
>>> level of repositories. We can use the naming scheme we currently use to
>>> help with. Our repositories are named as apachebeam/X, we could start using
>>> apache/beam/X.
>>> >
>>> >>>
>>> >>>
>>> >>> On Fri, Jan 10, 2020 at 2:06 PM Hannah Jiang <
>>> hannahji...@google.com> wrote:
>>> >>> >
>>> >>> > Thanks Ahmet for proposing it.
>>> >>> > I will take it and work towards v2.19.
>>> 
>>> 
>>>  Missed this part. Thank you Hannah!
>>> 
>>> >>>
>>> >>> >
>>> >>> > Hannah
>>> >>> >
>>> >>> > On Fri, Jan 10, 2020 at 1:50 PM Kyle Weaver <
>>> kcwea...@google.com> wrote:
>>> >>> >>
>>> >>> >> It'd be nice to have the clout/official sheen of apache
>>> attached 

Re: Beam's Avro 1.8.x dependency

2020-01-16 Thread Elliotte Rusty Harold
Avro does not follow semver. They update the major version when the
serialization format changes and the minor version when the API
changes in a backwards incompatible way. See
https://issues.apache.org/jira/browse/AVRO-2687

On Thu, Jan 16, 2020 at 12:50 PM Luke Cwik  wrote:
>
> Does avro not follow semantic versioning and upgrading to 1.9 should have 
> been backwards compatible or does our usage reach into the internals of avro?
>
> On Thu, Jan 16, 2020 at 6:16 AM Ismaël Mejía  wrote:
>>
>> I forgot to explain why the most obvious path (just upgrade Avro to version
>> 1.9.x) is not a valid long term solution. Other systems Beam runs on top of
>> (e.g.  Spark!) also leak Avro into their core so in the moment Spark moves up
>> to Avro 1.9.x Spark runner users will be in a really fragile position where
>> things will work until they don't (similar to Aaron's case) so a stronger 
>> reason
>> to getAvro out of Beam core.
>>
>>
>> On Thu, Jan 16, 2020 at 1:59 PM Elliotte Rusty Harold  
>> wrote:
>>>
>>> Shading should be a last resort:
>>>
>>> https://jlbp.dev/JLBP-18.html
>>>
>>> It tends to cause more problems than it solves. At best it's a stopgap
>>> measure when you don't have the resources to fix the real problem. In
>>> this case it sounds like the real issue is that AVRO is not stable.
>>> There are at least three other solutions in a case like this:
>>>
>>> 1. Fix Avro at the root.
>>> 2. Fork Avro and then fix it.
>>> 3. Stop depending on Avro.
>>>
>>> None of these are trivial which is why shading gets considered.
>>> However shading doesn't fix the underlying problems, and ultimately
>>> makes a product as unreliable as its least reliable dependency. :-(
>>>
>>> On Thu, Jan 16, 2020 at 2:01 AM jincheng sun  
>>> wrote:
>>> >
>>> > I found that there are several dependencies shaded and planned to made as 
>>> > vendored artifacts in [1]. I'm not sure why Avro is not shaded before. 
>>> > From my point of view, it's a good idea to shade Avro and make it a 
>>> > vendored artifact if there are no special reasons blocking us to do that. 
>>> > Regarding to how to create a vendored artifact, you can refer to [2] for 
>>> > more details.
>>> >
>>> > Best,
>>> > Jincheng
>>> >
>>> > [1] https://issues.apache.org/jira/browse/BEAM-5819
>>> > [2] https://github.com/apache/beam/blob/master/vendor/README.md
>>> >
>>> >
>>> > Tomo Suzuki  于2020年1月16日周四 下午1:18写道:
>>> >>
>>> >> I've been upgrading dependencies around gRPC. This Avro-problem is
>>> >> interesting to me.
>>> >> I'll study BEAM-8388 more tomorrow.
>>> >>
>>> >> On Wed, Jan 15, 2020 at 10:51 PM Luke Cwik  wrote:
>>> >> >
>>> >> > +Tomo Suzuki +jincheng sun
>>> >> > There have been a few contributors upgrading the dependencies and 
>>> >> > validating things not breaking by running the majority of the post 
>>> >> > commit integration tests and also using the linkage checker to show 
>>> >> > that we aren't worse off with respect to our dependency tree. Reaching 
>>> >> > out to them to help your is your best bet of getting these upgrades 
>>> >> > through.
>>> >> >
>>> >> > On Wed, Jan 15, 2020 at 6:52 PM Aaron Dixon  wrote:
>>> >> >>
>>> >> >> I meant to mention that we must use Avro 1.9.x as we rely on some 
>>> >> >> schema resolution fixes not present in 1.8.x - so am indeed blocked.
>>> >> >>
>>> >> >> On Wed, Jan 15, 2020 at 8:50 PM Aaron Dixon  wrote:
>>> >> >>>
>>> >> >>> It looks like Avro version dependency from Beam has come up in the 
>>> >> >>> past [1, 2].
>>> >> >>>
>>> >> >>> I'm currently on Beam 2.16.0, which has been compatible with my 
>>> >> >>> usage of Avro 1.9.x.
>>> >> >>>
>>> >> >>> But upgrading to Beam 2.17.0 is not possible for us now that 2.17.0 
>>> >> >>> has some dependencies on Avro classes only available in 1.8.x.
>>> >> >>>
>>> >> >>> Wondering if anyone else is similar blocked and what it would take 
>>> >> >>> to prioritize Beam upgrading to 1.9.x or better using a shaded 
>>> >> >>> version so that clients can use their own Avro version for their own 
>>> >> >>> coding purposes. (Eg, I parse Avro messages from a KafkaIO source 
>>> >> >>> and need 1.9.x for this but am perfectly happy if Beam's Avro coding 
>>> >> >>> facilities used a shaded other version.)
>>> >> >>>
>>> >> >>> I've made a comment on BEAM-8388 [1] to this effect. But polling 
>>> >> >>> community for discussion.
>>> >> >>>
>>> >> >>> [1] https://issues.apache.org/jira/browse/BEAM-8388
>>> >> >>> [2] https://github.com/apache/beam/pull/9779
>>> >> >>>
>>> >>
>>> >>
>>> >> --
>>> >> Regards,
>>> >> Tomo
>>>
>>>
>>>
>>> --
>>> Elliotte Rusty Harold
>>> elh...@ibiblio.org



-- 
Elliotte Rusty Harold
elh...@ibiblio.org


Re: Jenkins jobs not running for my PR 10438

2020-01-16 Thread Rehman Murad Ali
Thank Ismaël. Java Precommit failed due to timed out. Can you rerun it
please?

https://github.com/apache/beam/pull/10316



*Thanks & Regards*



*Rehman Murad Ali*
Software Engineer
Mobile: +92 3452076766
Skype: rehman.muradali


On Thu, Jan 16, 2020 at 7:17 PM Ismaël Mejía  wrote:

> done
>
> On Thu, Jan 16, 2020 at 8:08 AM Rehman Murad Ali <
> rehman.murad...@venturedive.com> wrote:
>
>> Hi,
>>
>> I appreciate if someone can run the mentioned job for this PR.
>> https://github.com/apache/beam/pull/10316
>>
>> Run Java Flink PortableValidatesRunner Streaming
>>
>>
>> *Thanks & Regards*
>>
>>
>>
>> *Rehman Murad Ali*
>> Software Engineer
>> Mobile: +92 3452076766
>> Skype: rehman.muradali
>>
>>
>> On Thu, Jan 16, 2020 at 3:44 AM Andrew Pilloud 
>> wrote:
>>
>>> Done.
>>>
>>> Infra shut our .adf.yaml file off for being too large. Updates are here:
>>> https://issues.apache.org/jira/browse/INFRA-19670
>>>
>>> On Wed, Jan 15, 2020 at 2:40 PM Tomo Suzuki  wrote:
>>>
 Hi Beam committers,

 Can somebody trigger the precommit cheeks for my new PR
 https://github.com/apache/beam/pull/10603 ?

 This PR still does not trigger the checks. I confirmed that my account
 is in the .adf.yaml.

 On Tue, Jan 14, 2020 at 9:48 PM Ahmet Altay  wrote:
 >
 > Done.
 >
 > +Kenneth Knowles, any updates from INFRA on this?
 >
 > On Tue, Jan 14, 2020 at 6:43 PM Tomo Suzuki 
 wrote:
 >>
 >> It hit Dataflow quota error again. Can somebody run
 >> Run Dataflow ValidatesRunner
 >> for https://github.com/apache/beam/pull/10554 ?
 >>
 >> On Tue, Jan 14, 2020 at 12:14 PM Tomo Suzuki 
 wrote:
 >> >
 >> > Valentyn, thank you.
 >> >
 >> > On Tue, Jan 14, 2020 at 12:05 PM Valentyn Tymofieiev
 >> >  wrote:
 >> > >
 >> > > Done. If tests still don't trigger, you could try to make a push
 to the branch to reset the test status.
 >> > >
 >> > > On Tue, Jan 14, 2020 at 8:38 AM Tomo Suzuki 
 wrote:
 >> > >>
 >> > >> Hi Beam developers,
 >> > >>
 >> > >> Can somebody run the following to
 https://github.com/apache/beam/pull/10554 ?
 >> > >> Run Dataflow ValidatesRunner
 >> > >> Run Java PreCommit
 >> > >>
 >> > >> On Mon, Jan 13, 2020 at 2:35 PM Tomo Suzuki 
 wrote:
 >> > >> >
 >> > >> > Thank you, Mark and Ismaël.
 >> > >> >
 >> > >> > On Mon, Jan 13, 2020 at 2:34 PM Mark Liu 
 wrote:
 >> > >> > >
 >> > >> > > done
 >> > >> > >
 >> > >> > > On Mon, Jan 13, 2020 at 8:03 AM Tomo Suzuki <
 suzt...@google.com> wrote:
 >> > >> > >>
 >> > >> > >> Thanks Yifan (but Java Precommit is still missing).
 >> > >> > >> Can somebody run "Run Java PreCommit" on
 >> > >> > >> https://github.com/apache/beam/pull/10554?
 >> > >> > >>
 >> > >> > >>
 >> > >> > >> On Mon, Jan 13, 2020 at 2:59 AM Yifan Zou <
 yifan...@google.com> wrote:
 >> > >> > >> >
 >> > >> > >> > done.
 >> > >> > >> >
 >> > >> > >> > On Sun, Jan 12, 2020 at 6:27 PM Tomo Suzuki <
 suzt...@google.com> wrote:
 >> > >> > >> >>
 >> > >> > >> >> Hi Beam committers,
 >> > >> > >> >>
 >> > >> > >> >> Four Jenkins jobs did not report back for this PR
 >> > >> > >> >> https://github.com/apache/beam/pull/10554 .
 >> > >> > >> >> Can somebody trigger them?
 >> > >> > >> >>
 >> > >> > >> >> On Fri, Jan 10, 2020 at 4:51 PM Andrew Pilloud <
 apill...@google.com> wrote:
 >> > >> > >> >> >
 >> > >> > >> >> > Done.
 >> > >> > >> >> >
 >> > >> > >> >> > On Fri, Jan 10, 2020 at 12:59 PM Tomo Suzuki <
 suzt...@google.com> wrote:
 >> > >> > >> >> >>
 >> > >> > >> >> >> Hi Bean developers,
 >> > >> > >> >> >>
 >> > >> > >> >> >> I appreciate a committer can trigger precommit build
 for
 >> > >> > >> >> >> https://github.com/apache/beam/pull/10554.
 >> > >> > >> >> >>
 >> > >> > >> >> >> In addition to normal precommit checks, I want the
 followings:
 >> > >> > >> >> >> Run Java PostCommit
 >> > >> > >> >> >> Run Java HadoopFormatIO Performance Test
 >> > >> > >> >> >> Run BigQueryIO Streaming Performance Test Java
 >> > >> > >> >> >> Run Dataflow ValidatesRunner
 >> > >> > >> >> >> Run Spark ValidatesRunner
 >> > >> > >> >> >> Run SQL Postcommit
 >> > >> > >> >> >>
 >> > >> > >> >> >> Regards,
 >> > >> > >> >> >> Tomo
 >> > >> > >> >>
 >> > >> > >> >>
 >> > >> > >> >>
 >> > >> > >> >> --
 >> > >> > >> >> Regards,
 >> > >> > >> >> Tomo
 >> > >> > >>
 >> > >> > >>
 >> > >> > >>
 >> > >> > >> --
 >> > >> > >> Regards,
 >> > >> > >> Tomo
 >> > >> >
 >> > >> >
 >> > >> >
 >> > >> > --
 >> > >> > Regards,
 >> > >> > Tomo
 >> > >>
 >> > >>
 >> > >>
 >> > >> --
 >> > >> Regards,
 >> > >> Tomo
 >> >
 >> >
 >> >
 >> > --
 >> > 

Re: [PROPOSAL] Leveraging SQL TableProviders for Cross-Language IOs

2020-01-16 Thread Kenneth Knowles
Nice! This is quite clever.

Kenn

On Mon, Jan 13, 2020 at 5:08 PM Chamikara Jayalath 
wrote:

> Thanks Brian. Added some comments.
>
> On Mon, Jan 13, 2020 at 2:25 PM Brian Hulette  wrote:
>
>> Hi everyone,
>> I have a proposal that I think can unify two problem sets:
>>   1) adding more IOs for Beam SQL, and
>>   2) making more (Row-based) Java IOs available in Python as
>> cross-language transforms
>>
>> The basic idea is to create a single cross-language transform that
>> exposes all Beam SQL IOs via the TableProvider interface. A design document
>> is available here: https://s.apache.org/xlang-table-provider
>>
>> Please take a look and let me know what you think. Thanks!
>> Brian
>>
>


Re: [RESULT] [VOTE] Beam's Mascot will be the Firefly (Lampyridae)

2020-01-16 Thread Kenneth Knowles
*correction: ASF does not require transfer of copyright, only appropriate
license

On Thu, Jan 16, 2020 at 8:45 AM Kenneth Knowles  wrote:

> Good question. IMO it is a very good thing to have fun as with the variety
> of uses of the Go language mascot.
>
> Note that copyright and trademark should be clearly separated in this
> discussion. These both govern "everyone can draw and adapt".
>
> Copyright: contributed images owned by ASF, licensed ASL2. You can use and
> create derivative works.
> Trademark: a mark owned by ASF, protected by ASF and Beam PMC. See
> http://www.apache.org/foundation/marks/ and particularly "nominative use".
>
> Kenn
>
> On Tue, Jan 14, 2020 at 1:43 PM Alex Van Boxel  wrote:
>
>> I hope for the mascot will be simple enough so everyone can draw it and
>> adapt. The mascot will be license free right... so you don't need to pay
>> the graphic artist for every use of the mascot?
>>
>>  _/
>> _/ Alex Van Boxel
>>
>>
>> On Tue, Jan 14, 2020 at 8:26 PM Aizhamal Nurmamat kyzy <
>> aizha...@apache.org> wrote:
>>
>>> Thanks Kenn for running the vote!
>>>
>>> I had reached out to a couple designers that I know personally and few
>>> in the community to see whether they were willing to contribute the
>>> designs.
>>>
>>> Julian was one of them, who agreed to work with us for a reasonable fee
>>> which can be donated by Google. Julian is a very talented visual artist and
>>> creates really cool animations too (if we want our Firefly to fly).
>>>
>>> Here is more about Julian’s work:
>>>
>>> 2D reel : https://youtu.be/2miCzKbuook
>>>
>>> linkedin: www.linkedin.com/in/julianbruno
>>>
>>> artstation: www.artstation.com/jbruno
>>>
>>> If you all agree to work with him, I will start the process. Here is how
>>> it is going to look like:
>>>
>>>
>>>1.
>>>
>>>Julian will be sending us a series of sketches of the firefly in the
>>>dev@ list, iterating on the version that we like the most
>>>2.
>>>
>>>If the sketches meet the community’s expectations, he will continue
>>>polishing the final design
>>>3.
>>>
>>>Once the design has been approved, he will give the final touches to
>>>it and send us raw files containing the character on whichever file 
>>> format
>>>we want
>>>
>>>
>>> What do you all think?
>>>
>>>
>>> On Fri, Jan 3, 2020 at 9:33 PM Kenneth Knowles  wrote:
>>>
 I am happy to announce that this vote has passed, with 20 approving +1
 votes, 5 of which are binding PMC votes.

 Beam's Mascot is the Firefly!

 Kenn

 On Fri, Jan 3, 2020 at 9:31 PM Kenneth Knowles  wrote:

> +1 (binding)
>
> On Tue, Dec 17, 2019 at 12:30 PM Leonardo Miguel <
> leonardo.mig...@arquivei.com.br> wrote:
>
>> +1
>>
>> Em sex., 13 de dez. de 2019 às 01:58, Kenneth Knowles <
>> k...@apache.org> escreveu:
>>
>>> Please vote on the proposal for Beam's mascot to be the Firefly.
>>> This encompasses the Lampyridae family of insects, without specifying a
>>> genus or species.
>>>
>>> [ ] +1, Approve Firefly being the mascot
>>> [ ] -1, Disapprove Firefly being the mascot
>>>
>>> The vote will be open for at least 72 hours excluding weekends. It
>>> is adopted by at least 3 PMC +1 approval votes, with no PMC -1 
>>> disapproval
>>> votes*. Non-PMC votes are still encouraged.
>>>
>>> PMC voters, please help by indicating your vote as "(binding)"
>>>
>>> Kenn
>>>
>>> *I have chosen this format for this vote, even though Beam uses
>>> simple majority as a rule, because I want any PMC member to be able to 
>>> veto
>>> based on concerns about overlap or trademark.
>>>
>>
>>
>> --
>> []s
>>
>> Leonardo Alves Miguel
>> Data Engineer
>> (16) 3509-5515 | www.arquivei.com.br
>> 
>> [image: Arquivei.com.br – Inteligência em Notas Fiscais]
>> 
>> [image: Google seleciona Arquivei para imersão e mentoria no Vale do
>> Silício]
>> 
>> 
>> 
>> 
>>
>


Re: [RESULT] [VOTE] Beam's Mascot will be the Firefly (Lampyridae)

2020-01-16 Thread Kenneth Knowles
Good question. IMO it is a very good thing to have fun as with the variety
of uses of the Go language mascot.

Note that copyright and trademark should be clearly separated in this
discussion. These both govern "everyone can draw and adapt".

Copyright: contributed images owned by ASF, licensed ASL2. You can use and
create derivative works.
Trademark: a mark owned by ASF, protected by ASF and Beam PMC. See
http://www.apache.org/foundation/marks/ and particularly "nominative use".

Kenn

On Tue, Jan 14, 2020 at 1:43 PM Alex Van Boxel  wrote:

> I hope for the mascot will be simple enough so everyone can draw it and
> adapt. The mascot will be license free right... so you don't need to pay
> the graphic artist for every use of the mascot?
>
>  _/
> _/ Alex Van Boxel
>
>
> On Tue, Jan 14, 2020 at 8:26 PM Aizhamal Nurmamat kyzy <
> aizha...@apache.org> wrote:
>
>> Thanks Kenn for running the vote!
>>
>> I had reached out to a couple designers that I know personally and few in
>> the community to see whether they were willing to contribute the designs.
>>
>> Julian was one of them, who agreed to work with us for a reasonable fee
>> which can be donated by Google. Julian is a very talented visual artist and
>> creates really cool animations too (if we want our Firefly to fly).
>>
>> Here is more about Julian’s work:
>>
>> 2D reel : https://youtu.be/2miCzKbuook
>>
>> linkedin: www.linkedin.com/in/julianbruno
>>
>> artstation: www.artstation.com/jbruno
>>
>> If you all agree to work with him, I will start the process. Here is how
>> it is going to look like:
>>
>>
>>1.
>>
>>Julian will be sending us a series of sketches of the firefly in the
>>dev@ list, iterating on the version that we like the most
>>2.
>>
>>If the sketches meet the community’s expectations, he will continue
>>polishing the final design
>>3.
>>
>>Once the design has been approved, he will give the final touches to
>>it and send us raw files containing the character on whichever file format
>>we want
>>
>>
>> What do you all think?
>>
>>
>> On Fri, Jan 3, 2020 at 9:33 PM Kenneth Knowles  wrote:
>>
>>> I am happy to announce that this vote has passed, with 20 approving +1
>>> votes, 5 of which are binding PMC votes.
>>>
>>> Beam's Mascot is the Firefly!
>>>
>>> Kenn
>>>
>>> On Fri, Jan 3, 2020 at 9:31 PM Kenneth Knowles  wrote:
>>>
 +1 (binding)

 On Tue, Dec 17, 2019 at 12:30 PM Leonardo Miguel <
 leonardo.mig...@arquivei.com.br> wrote:

> +1
>
> Em sex., 13 de dez. de 2019 às 01:58, Kenneth Knowles 
> escreveu:
>
>> Please vote on the proposal for Beam's mascot to be the Firefly. This
>> encompasses the Lampyridae family of insects, without specifying a genus 
>> or
>> species.
>>
>> [ ] +1, Approve Firefly being the mascot
>> [ ] -1, Disapprove Firefly being the mascot
>>
>> The vote will be open for at least 72 hours excluding weekends. It is
>> adopted by at least 3 PMC +1 approval votes, with no PMC -1 disapproval
>> votes*. Non-PMC votes are still encouraged.
>>
>> PMC voters, please help by indicating your vote as "(binding)"
>>
>> Kenn
>>
>> *I have chosen this format for this vote, even though Beam uses
>> simple majority as a rule, because I want any PMC member to be able to 
>> veto
>> based on concerns about overlap or trademark.
>>
>
>
> --
> []s
>
> Leonardo Alves Miguel
> Data Engineer
> (16) 3509-5515 | www.arquivei.com.br
> 
> [image: Arquivei.com.br – Inteligência em Notas Fiscais]
> 
> [image: Google seleciona Arquivei para imersão e mentoria no Vale do
> Silício]
> 
> 
> 
> 
>



Bangalore / Bengaluru Meetup

2020-01-16 Thread Austin Bennett
Hi Dev and Users,

Also we hope to kickoff a meetup in India this year.
https://www.meetup.com/Bangalore-Apache-Beam/

Please let us know if you'd like to get involved, speaking, hosting,
etc.  Reply to me, private or on thread, and/or use this survey link:
https://forms.gle/cud39eh3FA1em7EU7 (thanks @Tanay Tummalapalli for
compiling).

And, naturally signup in Meetup if interested to attend - as that is
where most of the messages on that topic will appear.

Cheers,
Austin


Re: Jenkins jobs not running for my PR 10438

2020-01-16 Thread Ismaël Mejía
done

On Thu, Jan 16, 2020 at 8:08 AM Rehman Murad Ali <
rehman.murad...@venturedive.com> wrote:

> Hi,
>
> I appreciate if someone can run the mentioned job for this PR.
> https://github.com/apache/beam/pull/10316
>
> Run Java Flink PortableValidatesRunner Streaming
>
>
> *Thanks & Regards*
>
>
>
> *Rehman Murad Ali*
> Software Engineer
> Mobile: +92 3452076766
> Skype: rehman.muradali
>
>
> On Thu, Jan 16, 2020 at 3:44 AM Andrew Pilloud 
> wrote:
>
>> Done.
>>
>> Infra shut our .adf.yaml file off for being too large. Updates are here:
>> https://issues.apache.org/jira/browse/INFRA-19670
>>
>> On Wed, Jan 15, 2020 at 2:40 PM Tomo Suzuki  wrote:
>>
>>> Hi Beam committers,
>>>
>>> Can somebody trigger the precommit cheeks for my new PR
>>> https://github.com/apache/beam/pull/10603 ?
>>>
>>> This PR still does not trigger the checks. I confirmed that my account
>>> is in the .adf.yaml.
>>>
>>> On Tue, Jan 14, 2020 at 9:48 PM Ahmet Altay  wrote:
>>> >
>>> > Done.
>>> >
>>> > +Kenneth Knowles, any updates from INFRA on this?
>>> >
>>> > On Tue, Jan 14, 2020 at 6:43 PM Tomo Suzuki 
>>> wrote:
>>> >>
>>> >> It hit Dataflow quota error again. Can somebody run
>>> >> Run Dataflow ValidatesRunner
>>> >> for https://github.com/apache/beam/pull/10554 ?
>>> >>
>>> >> On Tue, Jan 14, 2020 at 12:14 PM Tomo Suzuki 
>>> wrote:
>>> >> >
>>> >> > Valentyn, thank you.
>>> >> >
>>> >> > On Tue, Jan 14, 2020 at 12:05 PM Valentyn Tymofieiev
>>> >> >  wrote:
>>> >> > >
>>> >> > > Done. If tests still don't trigger, you could try to make a push
>>> to the branch to reset the test status.
>>> >> > >
>>> >> > > On Tue, Jan 14, 2020 at 8:38 AM Tomo Suzuki 
>>> wrote:
>>> >> > >>
>>> >> > >> Hi Beam developers,
>>> >> > >>
>>> >> > >> Can somebody run the following to
>>> https://github.com/apache/beam/pull/10554 ?
>>> >> > >> Run Dataflow ValidatesRunner
>>> >> > >> Run Java PreCommit
>>> >> > >>
>>> >> > >> On Mon, Jan 13, 2020 at 2:35 PM Tomo Suzuki 
>>> wrote:
>>> >> > >> >
>>> >> > >> > Thank you, Mark and Ismaël.
>>> >> > >> >
>>> >> > >> > On Mon, Jan 13, 2020 at 2:34 PM Mark Liu 
>>> wrote:
>>> >> > >> > >
>>> >> > >> > > done
>>> >> > >> > >
>>> >> > >> > > On Mon, Jan 13, 2020 at 8:03 AM Tomo Suzuki <
>>> suzt...@google.com> wrote:
>>> >> > >> > >>
>>> >> > >> > >> Thanks Yifan (but Java Precommit is still missing).
>>> >> > >> > >> Can somebody run "Run Java PreCommit" on
>>> >> > >> > >> https://github.com/apache/beam/pull/10554?
>>> >> > >> > >>
>>> >> > >> > >>
>>> >> > >> > >> On Mon, Jan 13, 2020 at 2:59 AM Yifan Zou <
>>> yifan...@google.com> wrote:
>>> >> > >> > >> >
>>> >> > >> > >> > done.
>>> >> > >> > >> >
>>> >> > >> > >> > On Sun, Jan 12, 2020 at 6:27 PM Tomo Suzuki <
>>> suzt...@google.com> wrote:
>>> >> > >> > >> >>
>>> >> > >> > >> >> Hi Beam committers,
>>> >> > >> > >> >>
>>> >> > >> > >> >> Four Jenkins jobs did not report back for this PR
>>> >> > >> > >> >> https://github.com/apache/beam/pull/10554 .
>>> >> > >> > >> >> Can somebody trigger them?
>>> >> > >> > >> >>
>>> >> > >> > >> >> On Fri, Jan 10, 2020 at 4:51 PM Andrew Pilloud <
>>> apill...@google.com> wrote:
>>> >> > >> > >> >> >
>>> >> > >> > >> >> > Done.
>>> >> > >> > >> >> >
>>> >> > >> > >> >> > On Fri, Jan 10, 2020 at 12:59 PM Tomo Suzuki <
>>> suzt...@google.com> wrote:
>>> >> > >> > >> >> >>
>>> >> > >> > >> >> >> Hi Bean developers,
>>> >> > >> > >> >> >>
>>> >> > >> > >> >> >> I appreciate a committer can trigger precommit build
>>> for
>>> >> > >> > >> >> >> https://github.com/apache/beam/pull/10554.
>>> >> > >> > >> >> >>
>>> >> > >> > >> >> >> In addition to normal precommit checks, I want the
>>> followings:
>>> >> > >> > >> >> >> Run Java PostCommit
>>> >> > >> > >> >> >> Run Java HadoopFormatIO Performance Test
>>> >> > >> > >> >> >> Run BigQueryIO Streaming Performance Test Java
>>> >> > >> > >> >> >> Run Dataflow ValidatesRunner
>>> >> > >> > >> >> >> Run Spark ValidatesRunner
>>> >> > >> > >> >> >> Run SQL Postcommit
>>> >> > >> > >> >> >>
>>> >> > >> > >> >> >> Regards,
>>> >> > >> > >> >> >> Tomo
>>> >> > >> > >> >>
>>> >> > >> > >> >>
>>> >> > >> > >> >>
>>> >> > >> > >> >> --
>>> >> > >> > >> >> Regards,
>>> >> > >> > >> >> Tomo
>>> >> > >> > >>
>>> >> > >> > >>
>>> >> > >> > >>
>>> >> > >> > >> --
>>> >> > >> > >> Regards,
>>> >> > >> > >> Tomo
>>> >> > >> >
>>> >> > >> >
>>> >> > >> >
>>> >> > >> > --
>>> >> > >> > Regards,
>>> >> > >> > Tomo
>>> >> > >>
>>> >> > >>
>>> >> > >>
>>> >> > >> --
>>> >> > >> Regards,
>>> >> > >> Tomo
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Regards,
>>> >> > Tomo
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Regards,
>>> >> Tomo
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Tomo
>>>
>>


Re: Beam's Avro 1.8.x dependency

2020-01-16 Thread Ismaël Mejía
I forgot to explain why the most obvious path (just upgrade Avro to version
1.9.x) is not a valid long term solution. Other systems Beam runs on top of
(e.g.  Spark!) also leak Avro into their core so in the moment Spark moves
up
to Avro 1.9.x Spark runner users will be in a really fragile position where
things will work until they don't (similar to Aaron's case) so a stronger
reason
to getAvro out of Beam core.


On Thu, Jan 16, 2020 at 1:59 PM Elliotte Rusty Harold 
wrote:

> Shading should be a last resort:
>
> https://jlbp.dev/JLBP-18.html
>
> It tends to cause more problems than it solves. At best it's a stopgap
> measure when you don't have the resources to fix the real problem. In
> this case it sounds like the real issue is that AVRO is not stable.
> There are at least three other solutions in a case like this:
>
> 1. Fix Avro at the root.
> 2. Fork Avro and then fix it.
> 3. Stop depending on Avro.
>
> None of these are trivial which is why shading gets considered.
> However shading doesn't fix the underlying problems, and ultimately
> makes a product as unreliable as its least reliable dependency. :-(
>
> On Thu, Jan 16, 2020 at 2:01 AM jincheng sun 
> wrote:
> >
> > I found that there are several dependencies shaded and planned to made
> as vendored artifacts in [1]. I'm not sure why Avro is not shaded before.
> From my point of view, it's a good idea to shade Avro and make it a
> vendored artifact if there are no special reasons blocking us to do that.
> Regarding to how to create a vendored artifact, you can refer to [2] for
> more details.
> >
> > Best,
> > Jincheng
> >
> > [1] https://issues.apache.org/jira/browse/BEAM-5819
> > [2] https://github.com/apache/beam/blob/master/vendor/README.md
> >
> >
> > Tomo Suzuki  于2020年1月16日周四 下午1:18写道:
> >>
> >> I've been upgrading dependencies around gRPC. This Avro-problem is
> >> interesting to me.
> >> I'll study BEAM-8388 more tomorrow.
> >>
> >> On Wed, Jan 15, 2020 at 10:51 PM Luke Cwik  wrote:
> >> >
> >> > +Tomo Suzuki +jincheng sun
> >> > There have been a few contributors upgrading the dependencies and
> validating things not breaking by running the majority of the post commit
> integration tests and also using the linkage checker to show that we aren't
> worse off with respect to our dependency tree. Reaching out to them to help
> your is your best bet of getting these upgrades through.
> >> >
> >> > On Wed, Jan 15, 2020 at 6:52 PM Aaron Dixon 
> wrote:
> >> >>
> >> >> I meant to mention that we must use Avro 1.9.x as we rely on some
> schema resolution fixes not present in 1.8.x - so am indeed blocked.
> >> >>
> >> >> On Wed, Jan 15, 2020 at 8:50 PM Aaron Dixon 
> wrote:
> >> >>>
> >> >>> It looks like Avro version dependency from Beam has come up in the
> past [1, 2].
> >> >>>
> >> >>> I'm currently on Beam 2.16.0, which has been compatible with my
> usage of Avro 1.9.x.
> >> >>>
> >> >>> But upgrading to Beam 2.17.0 is not possible for us now that 2.17.0
> has some dependencies on Avro classes only available in 1.8.x.
> >> >>>
> >> >>> Wondering if anyone else is similar blocked and what it would take
> to prioritize Beam upgrading to 1.9.x or better using a shaded version so
> that clients can use their own Avro version for their own coding purposes.
> (Eg, I parse Avro messages from a KafkaIO source and need 1.9.x for this
> but am perfectly happy if Beam's Avro coding facilities used a shaded other
> version.)
> >> >>>
> >> >>> I've made a comment on BEAM-8388 [1] to this effect. But polling
> community for discussion.
> >> >>>
> >> >>> [1] https://issues.apache.org/jira/browse/BEAM-8388
> >> >>> [2] https://github.com/apache/beam/pull/9779
> >> >>>
> >>
> >>
> >> --
> >> Regards,
> >> Tomo
>
>
>
> --
> Elliotte Rusty Harold
> elh...@ibiblio.org
>


Re: Beam's Avro 1.8.x dependency

2020-01-16 Thread Elliotte Rusty Harold
Shading should be a last resort:

https://jlbp.dev/JLBP-18.html

It tends to cause more problems than it solves. At best it's a stopgap
measure when you don't have the resources to fix the real problem. In
this case it sounds like the real issue is that AVRO is not stable.
There are at least three other solutions in a case like this:

1. Fix Avro at the root.
2. Fork Avro and then fix it.
3. Stop depending on Avro.

None of these are trivial which is why shading gets considered.
However shading doesn't fix the underlying problems, and ultimately
makes a product as unreliable as its least reliable dependency. :-(

On Thu, Jan 16, 2020 at 2:01 AM jincheng sun  wrote:
>
> I found that there are several dependencies shaded and planned to made as 
> vendored artifacts in [1]. I'm not sure why Avro is not shaded before. From 
> my point of view, it's a good idea to shade Avro and make it a vendored 
> artifact if there are no special reasons blocking us to do that. Regarding to 
> how to create a vendored artifact, you can refer to [2] for more details.
>
> Best,
> Jincheng
>
> [1] https://issues.apache.org/jira/browse/BEAM-5819
> [2] https://github.com/apache/beam/blob/master/vendor/README.md
>
>
> Tomo Suzuki  于2020年1月16日周四 下午1:18写道:
>>
>> I've been upgrading dependencies around gRPC. This Avro-problem is
>> interesting to me.
>> I'll study BEAM-8388 more tomorrow.
>>
>> On Wed, Jan 15, 2020 at 10:51 PM Luke Cwik  wrote:
>> >
>> > +Tomo Suzuki +jincheng sun
>> > There have been a few contributors upgrading the dependencies and 
>> > validating things not breaking by running the majority of the post commit 
>> > integration tests and also using the linkage checker to show that we 
>> > aren't worse off with respect to our dependency tree. Reaching out to them 
>> > to help your is your best bet of getting these upgrades through.
>> >
>> > On Wed, Jan 15, 2020 at 6:52 PM Aaron Dixon  wrote:
>> >>
>> >> I meant to mention that we must use Avro 1.9.x as we rely on some schema 
>> >> resolution fixes not present in 1.8.x - so am indeed blocked.
>> >>
>> >> On Wed, Jan 15, 2020 at 8:50 PM Aaron Dixon  wrote:
>> >>>
>> >>> It looks like Avro version dependency from Beam has come up in the past 
>> >>> [1, 2].
>> >>>
>> >>> I'm currently on Beam 2.16.0, which has been compatible with my usage of 
>> >>> Avro 1.9.x.
>> >>>
>> >>> But upgrading to Beam 2.17.0 is not possible for us now that 2.17.0 has 
>> >>> some dependencies on Avro classes only available in 1.8.x.
>> >>>
>> >>> Wondering if anyone else is similar blocked and what it would take to 
>> >>> prioritize Beam upgrading to 1.9.x or better using a shaded version so 
>> >>> that clients can use their own Avro version for their own coding 
>> >>> purposes. (Eg, I parse Avro messages from a KafkaIO source and need 
>> >>> 1.9.x for this but am perfectly happy if Beam's Avro coding facilities 
>> >>> used a shaded other version.)
>> >>>
>> >>> I've made a comment on BEAM-8388 [1] to this effect. But polling 
>> >>> community for discussion.
>> >>>
>> >>> [1] https://issues.apache.org/jira/browse/BEAM-8388
>> >>> [2] https://github.com/apache/beam/pull/9779
>> >>>
>>
>>
>> --
>> Regards,
>> Tomo



-- 
Elliotte Rusty Harold
elh...@ibiblio.org


Re: Jenkins job execution policy

2020-01-16 Thread Katarzyna Kucharczyk
Hi all,

Thanks for starting this thread. I have another questions about this policy
change.

I don't know If you also noticed that behaviour of Phrase Triggering became
really unpredictable since Policy changed. What usually happens is that
after "retest this please" command no tests running are shown on github.
After checking Jenkins they are started there.
Today I experienced the very same behaviour. But what's more after "retest
this please" finished i commented PR with "run seed job" what triggered
again whole tests with retesting.
This mainly extends review/triggering tests for someone and redundant test
runs what may drain resources for other users.

Are also experiencing those strange behaviours? Or do you have any solution
how phrase trigger so it would behave correctly?

Thanks,
Kasia

On Wed, Jan 15, 2020 at 9:48 AM Michał Walenia 
wrote:

> Thanks for adding the whitelist!
> I have the same issue as Kirill, the tests run when I push commits, phrase
> triggering works in a strange way - the jobs don't run after a comment, but
> after a push following the comment. Is there a ghprb config that was
> changed, limiting the range of github triggers for the jobs?
> Michal
>
> On Wed, Jan 15, 2020 at 1:55 AM Kirill Kozlov 
> wrote:
>
>> Thanks for working on this!
>>
>> I have noticed that tests run for new PRs and force-pushed commits, but
>> if a test fails due to a flake I am unable to re-run it (ex: "Run Java
>> PreCommit").
>> PR that has this issue: https://github.com/apache/beam/pull/10369.
>> Is this intended behaviour?
>>
>> -
>> Kirill
>>
>> On Tue, Jan 14, 2020 at 3:20 PM Luke Cwik  wrote:
>>
>>> Does the approval list live beyond the lifetime of the jenkins machine
>>> (my initial impression is that the approval list disappears on Jenkins
>>> machine restart)?
>>>
>>> Also, I imagine that ASF wants an explicit way to see who is approved
>>> and who is denied which the plugin doesn't seem to allow.
>>>
>>> On Tue, Jan 14, 2020 at 3:11 PM Pablo Estrada 
>>> wrote:
>>>
 I've merged https://github.com/apache/beam/pull/10582 to unblock
 existing contributors that are having trouble getting their PRs tested
 without committer help. We can discuss Kai's suggestion.

 Looking at https://github.com/jenkinsci/ghprb-plugin, it seems like
 the 'add to whitelist' comment adds contributors permanently to a
 whitelist. This would have more immediate results than the .asf.yaml file.
 It would be harder to track who has the privilege, but it doesn't sound
 like that concerns us, right?

 Thoughts from others?
 -P.

 On Tue, Jan 14, 2020 at 1:43 PM Kai Jiang  wrote:

> Nice! I took a look at Beam Jenkins job properties (
> CommonJobProperties.groovy#L108-L111
> )
> and it uses jenkinsci/ghprb-plugin
> .
> It should support the feature of comment add to whitelist from
> committer on PR for adding new contributors to whitelist.
> Adding github account to asf yaml might be a little heavy if this
> approach works. Could we also test on this method?
>
> Best,
> Kai
>
>
> On Tue, Jan 14, 2020 at 1:16 PM Pablo Estrada 
> wrote:
>
>> I've added all the PR authors for the last 1000 merged PRs. I will
>> merge in a few minutes. I'll have a follow up change to document this on
>> the website.
>>
>> On Tue, Jan 14, 2020 at 11:29 AM Luke Cwik  wrote:
>>
>>> Should we scrape all past contributors and add them to the file?
>>>
>>> On Tue, Jan 14, 2020 at 11:18 AM Kenneth Knowles 
>>> wrote:
>>>
 Nice! This will help at least temporarily. We can see if it grows
 too unwieldy. It is still unfriendly to newcomers.

 Kenn

 On Tue, Jan 14, 2020 at 11:06 AM Pablo Estrada 
 wrote:

> Hi all,
> ASF INFRA gave us a middle-ground sort of workaround for this by
> using .asf.yaml files. Here's a change to implement it[1], and
> documentation for the .asf.yaml file[2], as well as the relevant 
> section
> for our case[3].
>
> I'll check the docs in [2] well before pushing to merge, just to
> be sure we're not breaking anything.
>
> [1] https://github.com/apache/beam/pull/10582
> [2]
> https://cwiki.apache.org/confluence/display/INFRA/.asf.yaml+features+for+git+repositories
>
> [3]
> https://cwiki.apache.org/confluence/display/INFRA/.asf.yaml+features+for+git+repositories#id-.asf.yamlfeaturesforgitrepositories-JenkinsPRWhitelisting
>
> On Mon, Jan 13, 2020 at 3:29 PM Luke Cwik 
> wrote:
>
>> I'm for going back to the status quo where anyone's PR ran the
>> 

Re: Beam's Avro 1.8.x dependency

2020-01-16 Thread Gleb Kanterov
Adding to Ismaël, I find moving Avro out of the core, and keeping
compatibility as a non-exclusive options. Of course, it would require more
effort from our side.

On Thu, Jan 16, 2020 at 12:29 PM Ismaël Mejía  wrote:

> For the interested there was also some extra context in the discussion at:
> https://github.com/apache/beam/pull/9779
>
> Gleb mentioned the key two points:
>
> 1. The fact that Avro is exposed in the User API in beam-sdks-java-core
> was a
>mistake and makes fixing this issue backwards incompatible.
>
> 2. Shading is not an option because Avro compiler would generate specific
>records that won't match the non-vendored version so we will break user
>records compatibility (for example for users with a schema registry).
>
> So save if I am missing something and someone can give an alternative, we
> are
> in a situation where the only solution to the issue is to do (1), move
> Avro out
> of core as an extension but then the question is would we sacrifice
> breaking
> backwards compatibility for this issue. I am in the 'we should do it' camp.
> What do others think?
>
>
> On Thu, Jan 16, 2020 at 10:17 AM Gleb Kanterov  wrote:
>
>> There are significant changes between Avro 1.8 and Avro 1.9. I'm not sure
>> it's possible for beam-sdks-java-core to support both at the same time. The
>> fact that AvroIO is a part of the beam-sdks-java-core doesn't make it
>> simpler. However, I can see how we can build two binary artifacts with the
>> same user-facing API each supporting own version of Avro.
>>
>> Shading or vendoring would be a breaking change because public signatures
>> of AvroIO (and few other IOs, for instance, BigQueryIO) refer to classes
>> from Avro, for instance, GenericRecord. Furthermore, a lot of Beam users
>> use Avro compiler to generate Java code for SpecificRecord, which would
>> refer to non-vendored version.
>>
>


Re: Beam's Avro 1.8.x dependency

2020-01-16 Thread Ismaël Mejía
For the interested there was also some extra context in the discussion at:
https://github.com/apache/beam/pull/9779

Gleb mentioned the key two points:

1. The fact that Avro is exposed in the User API in beam-sdks-java-core was
a
   mistake and makes fixing this issue backwards incompatible.

2. Shading is not an option because Avro compiler would generate specific
   records that won't match the non-vendored version so we will break user
   records compatibility (for example for users with a schema registry).

So save if I am missing something and someone can give an alternative, we
are
in a situation where the only solution to the issue is to do (1), move Avro
out
of core as an extension but then the question is would we sacrifice breaking
backwards compatibility for this issue. I am in the 'we should do it' camp.
What do others think?


On Thu, Jan 16, 2020 at 10:17 AM Gleb Kanterov  wrote:

> There are significant changes between Avro 1.8 and Avro 1.9. I'm not sure
> it's possible for beam-sdks-java-core to support both at the same time. The
> fact that AvroIO is a part of the beam-sdks-java-core doesn't make it
> simpler. However, I can see how we can build two binary artifacts with the
> same user-facing API each supporting own version of Avro.
>
> Shading or vendoring would be a breaking change because public signatures
> of AvroIO (and few other IOs, for instance, BigQueryIO) refer to classes
> from Avro, for instance, GenericRecord. Furthermore, a lot of Beam users
> use Avro compiler to generate Java code for SpecificRecord, which would
> refer to non-vendored version.
>


Re: Beam's Avro 1.8.x dependency

2020-01-16 Thread Gleb Kanterov
There are significant changes between Avro 1.8 and Avro 1.9. I'm not sure
it's possible for beam-sdks-java-core to support both at the same time. The
fact that AvroIO is a part of the beam-sdks-java-core doesn't make it
simpler. However, I can see how we can build two binary artifacts with the
same user-facing API each supporting own version of Avro.

Shading or vendoring would be a breaking change because public signatures
of AvroIO (and few other IOs, for instance, BigQueryIO) refer to classes
from Avro, for instance, GenericRecord. Furthermore, a lot of Beam users
use Avro compiler to generate Java code for SpecificRecord, which would
refer to non-vendored version.