Re: Jenkins jobs not running for my PR 10438

2020-05-11 Thread rahul patwari
Hi,

Can you please trigger pre-commit checks for
https://github.com/apache/beam/pull/11581

Thanks,
Rahul

On Tue, May 12, 2020 at 7:12 AM Ahmet Altay  wrote:

> Done for both Yoshiki and Tomo's PRs.
>
> On Mon, May 11, 2020 at 6:33 PM Tomo Suzuki  wrote:
>
>> Hi Beam committers,
>>
>> Would you run the basic precommit checks for
>> https://github.com/apache/beam/pull/11674 ?
>>
>> Regards,
>> Tomo
>>
>


Re: Apache Beam application to Season of Docs 2020

2020-05-11 Thread Pablo Estrada
I'll be happy to mentor for improvements to the Capability Matrix. There
had been some discussions about that earlier.
+Kenneth Knowles  +Robert Bradshaw
 +Thomas
Weise   I may set a quick meeting with each of you to get
your thoughts about Capability Matrix improvements. (also happy to welcome
an extra mentor if you feel up to it : ))

On Mon, May 11, 2020 at 5:16 PM Kyle Weaver  wrote:

> Cool! I signed up as a mentor.
>
> On Mon, May 11, 2020 at 4:56 PM Aizhamal Nurmamat kyzy <
> aizha...@apache.org> wrote:
>
>> PS: I am registered as a mentor too, happy to onboard the tech writers to
>> The Apache Way and Beam community processes when time comes.
>>
>> On Mon, May 11, 2020 at 1:52 PM Aizhamal Nurmamat kyzy <
>> aizha...@apache.org> wrote:
>>
>>> Apache Beam got accepted into Season of Docs, yay! [1]
>>>
>>> @Kyle/Pablo, could you please fill out the mentor registration form by
>>> tomorrow morning [2]?
>>>
>>> Is there anyone else interested in becoming a mentor for either of the 2
>>> projects [3]? The program requires at least two open source mentors for
>>> each technical writer in case we get 2.
>>>
>>> [1] https://developers.google.com/season-of-docs/docs/participants/
>>> [2]
>>> https://docs.google.com/forms/d/e/1FAIpQLSfMZ3yCf24PFUbvpSOkVy4sZUJFY5oS7HbGyXTNXzFg2btp4Q/viewform
>>> [3]
>>> https://cwiki.apache.org/confluence/display/BEAM/Google+Season+of+Docs
>>>
>>>
>>> On Tue, May 5, 2020 at 5:31 PM Pablo Estrada  wrote:
>>>
 I think I am willing to help with Project 2.

 On Tue, May 5, 2020 at 2:01 PM Aizhamal Nurmamat kyzy <
 aizha...@apache.org> wrote:

> Thank you, Kyle! really appreciate it! I will add you as a mentor into
> the cwiki page, and let you know if Beam gets accepted to the program on
> May 11th.
>
> On Tue, May 5, 2020 at 12:55 PM Kyle Weaver 
> wrote:
>
>> Thanks Aizhamal! I would be willing to help with project 1.
>>
>> On Mon, May 4, 2020 at 5:04 PM Aizhamal Nurmamat kyzy <
>> aizha...@apache.org> wrote:
>>
>>> Hi all,
>>>
>>> I have submitted the application to the Season of Docs program with
>>> the project ideas we have developed last year [1]. I learnt about its
>>> deadline a few hours ago and didn't want to miss it.
>>>
>>> Feel free to add more project ideas (or edit the current ones) until
>>> May 7th.
>>>
>>> If Beam gets approved, we will get 1 or 2 experienced technical
>>> writers to help us document community processes or some Beam features. 
>>> Is
>>> anyone else willing to mentor for these projects?
>>>
>>> [1]
>>> https://cwiki.apache.org/confluence/display/BEAM/Google+Season+of+Docs
>>>
>>


Re: Jenkins jobs not running for my PR 10438

2020-05-11 Thread Ahmet Altay
Done for both Yoshiki and Tomo's PRs.

On Mon, May 11, 2020 at 6:33 PM Tomo Suzuki  wrote:

> Hi Beam committers,
>
> Would you run the basic precommit checks for
> https://github.com/apache/beam/pull/11674 ?
>
> Regards,
> Tomo
>


Re: Jenkins jobs not running for my PR 10438

2020-05-11 Thread Tomo Suzuki
Hi Beam committers,

Would you run the basic precommit checks for
https://github.com/apache/beam/pull/11674 ?

Regards,
Tomo


Re: Apache Beam application to Season of Docs 2020

2020-05-11 Thread Kyle Weaver
Cool! I signed up as a mentor.

On Mon, May 11, 2020 at 4:56 PM Aizhamal Nurmamat kyzy 
wrote:

> PS: I am registered as a mentor too, happy to onboard the tech writers to
> The Apache Way and Beam community processes when time comes.
>
> On Mon, May 11, 2020 at 1:52 PM Aizhamal Nurmamat kyzy <
> aizha...@apache.org> wrote:
>
>> Apache Beam got accepted into Season of Docs, yay! [1]
>>
>> @Kyle/Pablo, could you please fill out the mentor registration form by
>> tomorrow morning [2]?
>>
>> Is there anyone else interested in becoming a mentor for either of the 2
>> projects [3]? The program requires at least two open source mentors for
>> each technical writer in case we get 2.
>>
>> [1] https://developers.google.com/season-of-docs/docs/participants/
>> [2]
>> https://docs.google.com/forms/d/e/1FAIpQLSfMZ3yCf24PFUbvpSOkVy4sZUJFY5oS7HbGyXTNXzFg2btp4Q/viewform
>> [3]
>> https://cwiki.apache.org/confluence/display/BEAM/Google+Season+of+Docs
>>
>>
>> On Tue, May 5, 2020 at 5:31 PM Pablo Estrada  wrote:
>>
>>> I think I am willing to help with Project 2.
>>>
>>> On Tue, May 5, 2020 at 2:01 PM Aizhamal Nurmamat kyzy <
>>> aizha...@apache.org> wrote:
>>>
 Thank you, Kyle! really appreciate it! I will add you as a mentor into
 the cwiki page, and let you know if Beam gets accepted to the program on
 May 11th.

 On Tue, May 5, 2020 at 12:55 PM Kyle Weaver 
 wrote:

> Thanks Aizhamal! I would be willing to help with project 1.
>
> On Mon, May 4, 2020 at 5:04 PM Aizhamal Nurmamat kyzy <
> aizha...@apache.org> wrote:
>
>> Hi all,
>>
>> I have submitted the application to the Season of Docs program with
>> the project ideas we have developed last year [1]. I learnt about its
>> deadline a few hours ago and didn't want to miss it.
>>
>> Feel free to add more project ideas (or edit the current ones) until
>> May 7th.
>>
>> If Beam gets approved, we will get 1 or 2 experienced technical
>> writers to help us document community processes or some Beam features. Is
>> anyone else willing to mentor for these projects?
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/BEAM/Google+Season+of+Docs
>>
>


Re: [DISCUSS] finishBundle once per window

2020-05-11 Thread Robert Bradshaw
StartBundle pre-dated setUp, which makes it less useful than before. With
DoFn re-use, however, startBundle can be used to ensure the DoFn is
instantiated to a clean state.

On Thu, May 7, 2020 at 2:03 PM Reuven Lax  wrote:

> I think startBundle is useful for convenience and performance, but not
> necessarily needed semantically (as Kenn said, you could write your
> pipeline without startBundle). finishBundle has a stronger semantic
> meaning when interpreted as a way of finalizing elements.
>
> On Thu, May 7, 2020 at 2:00 PM Luke Cwik  wrote:
>
>> Start bundle is useful since the framework provides the necessary
>> synchronization while using lazy init requires you to write it yourself and
>> also pay for it on each process element call.
>>
>> On Wed, May 6, 2020 at 8:46 AM Kenneth Knowles  wrote:
>>
>>> This is a great idea. I thought that (long ago) we decided not to
>>> execute finishBundle per window for these reasons: (1) perf fears (2)
>>> naming bikeshed (3) backwards compatibility (even though we hadn't
>>> stabilized, it is pervasive). That was before the annotation-driven DoFn I
>>> believe, so we didn't have the ability to do it this way. Now this seems
>>> like a clear win.
>>>
>>> Regarding @StartBundle: I always use and advise others to use lazy init
>>> instead of @StartBundle and to think of @FinishBundle as "flush".
>>> State-sensitive APIs like "start(); process(); finish()" are usually an
>>> anti-pattern since you can almost always write them in a less dangerous way
>>> (try-with-resources, Python context managers, etc). Conveniently, this
>>> eliminates any consideration of symmetry. Can anyone refresh me on
>>> when/whether it is important to have @StartBundle instead of running the
>>> same code via lazy init?
>>>
>>>
>>> On Tue, May 5, 2020 at 3:16 PM Robert Bradshaw 
>>> wrote:
>>>
 On Tue, May 5, 2020 at 3:08 PM Reuven Lax  wrote:
 >
 > On Tue, May 5, 2020 at 2:58 PM Robert Bradshaw 
 wrote:
 >>
 >> On Mon, May 4, 2020 at 11:08 AM Reuven Lax  wrote:
 >> >
 >> > This should not affect the ability of the user to specify the
 output timestamp.
 >>
 >> My question was whether we would require it.
 >
 >
 > My current PR does not - it defaults to end-of-window as the
 timestamp. However we could also decide to require it.

 I'd be more comfortable requiring it for the time being.

>>>
>>> +1 for requiring it
>>>
>>> Kenn
>>>
>>> >> On Mon, May 4, 2020 at 11:48 AM Jan Lukavský  wrote:
 >> >
 >> > There was a mention in some other thread, that in order to make
 user experience as predictable as possible, we should try to make windows
 idempotent, and once window is assigned, it should be never changed (and
 timestamp move outside of the scope of window, unless a different windowfn
 is applied). Because all Beam window functions are actually time based, and
 output timestamp is known, what is the issue of applying windowfn to
 elements output from @FinishBundle and assign the windows automatically?
 >>
 >> We used to do exactly this. (I don't recall why it was removed.) If
 >> the input element and/or window was queried by the WindowFn
 >> (admittedly rare), it would fail at runtime.
 >
 > When did we used to do this? We've had users writing WindowFns that
 queried the input element since long before Beam existed.  e.g a window fn
 that inspected a userId field, and created different sized windows based on
 the userId.

 This is how it started. In particular WindowFn.AssignContext would be
 created that through an exception on accessing the unavailable fields
 (which would make finalize bundle unsuitable for such WindowFns).

>>>
>>> Yea this was not great. This also broke the equivalence between WithKeys
>>> and AssignWindows. It was really a workaround for the lack of the feature
>>> Reuven is proposing.
>>>
>>>
>>>
 >> On Tue, May 5, 2020 at 2:51 PM Reuven Lax  wrote:
 >> >
 >> > It's a good question about startBundle - it's something I thought
 about. The problem is that a runner doesn't always know at startBundle what
 windows are in the bundle, and even if it does know it might require the
 runner to run two passes over the bundle to figure this out. Alternatively
 the runner could keep calling startBundle the first time it's seen a new
 window in the bundle, but I think that makes things even weirder. It's also
 worth noting that startBundle is already more limited today - we do not
 support calling output from startBundle, but we do allow calling output
 from finishBundle.
 >> >
 >> > Reuven
 >> >
 >> > On Mon, May 4, 2020 at 11:59 PM Jan Lukavský 
 wrote:
 >> >>
 >> >> Ah, interesting. That makes windowFn non-idempotent by
 definition, because its first application (e.g. global window -> interval
 window) _might_ yield different result than 

Re: JIRA priorities explaination

2020-05-11 Thread Kenneth Knowles
This is filed as https://issues.apache.org/jira/browse/INFRA-20231 which
should take place now.

On Fri, May 1, 2020 at 3:53 PM Ahmet Altay  wrote:

> +1 sounds good to me. Oftentimes I confused the relative priorities of
> critical/blocker/major.
>
> On Fri, May 1, 2020 at 3:05 PM Tyson Hamilton  wrote:
>
>> Proposal sounds good to me! The tool tips will be fantastic.
>>
>> On Fri, May 1, 2020 at 3:03 PM Robert Bradshaw 
>> wrote:
>>
>>> On Fri, May 1, 2020 at 2:34 PM Kenneth Knowles  wrote:
>>> >
>>> > Coming back to this thread (again!)
>>> >
>>> > I wrote up https://beam.apache.org/contribute/jira-priorities/ and
>>> https://beam.apache.org/contribute/release-blockers/ and I have had
>>> success communicating using these docs.
>>> >
>>> > However, some people get confused because the existing Jira priorities
>>> have tooltips that say something slightly different [1], or they just don't
>>> discover the site.
>>> >
>>> > Since Jira 7.6.0, I think, it is possible to customize this in Jira
>>> directly. [2]
>>> >
>>> > What do you think about changing from the default priorities to just
>>> P0, P1, etc, and using these tooltips that match the docs on the Beam site?
>>> >
>>> > P0 - Outage blocking development and/or testing work; requires
>>> immediate and continuous attention
>>> > P1 - Critical bug: data loss, total loss of function, or loss of
>>> testing signal due to test failures or flakiness
>>> > P2 - Default priority. Will be triaged and planned according to
>>> community practices.
>>> > P3 - Non-urgent bugs, features, and improvements
>>> > P4 - Trivial items, spelling errors, etc.
>>> >
>>> > This is related to the "Automation for Jira" thread. It was suggested
>>> to automatically lower priorities of stale bugs, to match reality and let
>>> us focus on the bugs that remain at higher priorities. I hope automatically
>>> moving "P2" to "P3" with these tooltips is nicer for people than
>>> automatically moving "Major" to "Minor". Using the default words seems like
>>> you are telling the user their problem is minor.
>>>
>>> That's a great point, +1.
>>>
>>> >
>>> > Kenn
>>> >
>>> > [1]
>>> https://issues.apache.org/jira/secure/ShowConstantsHelp.jspa?decorator=popup#PriorityLevels
>>> > [2] https://jira.atlassian.com/browse/JRASERVER-3821
>>> >
>>> > On Fri, Oct 25, 2019 at 4:25 PM Pablo Estrada 
>>> wrote:
>>> >>
>>> >> That SGTM
>>> >>
>>> >> On Fri, Oct 25, 2019 at 4:18 PM Robert Bradshaw 
>>> wrote:
>>> >>>
>>> >>> +1 to both.
>>> >>>
>>> >>> On Fri, Oct 25, 2019 at 3:58 PM Valentyn Tymofieiev <
>>> valen...@google.com> wrote:
>>> >>> >
>>> >>> > On Fri, Oct 25, 2019 at 3:39 PM Kenneth Knowles 
>>> wrote:
>>> >>> >>
>>> >>> >> Suppose, hypothetically, we say that if Fix Version is set, then
>>> P0/Blocker and P1/Critical block release and lower priorities get bumped.
>>> >>> >
>>> >>> >
>>> >>> > +1 to Kenn's suggestion.  In addition, we can discourage setting
>>> Fix version for non-critical issues before issues are fixed.
>>> >>> >
>>> >>> >>
>>> >>> >>
>>> >>> >> Most likely the release manager still pings and asks about all
>>> those before bumping. Which means that in effect they were part of the burn
>>> down and do block the release in the sense that they must be re-triaged
>>> away to the next release. I would prefer less work for the release manager
>>> and more emphasis on the default being nonblocking.
>>> >>> >>
>>> >>> >> One very different possibility is to ignore Fix Version on open
>>> bugs and use a different search query as the burndown, auto bump everything
>>> that didn't make it.
>>> >>> >
>>> >>> > This may create a situation where an issue will eventually be
>>> closed, but Fix Version not updated, and confuse the users who will rely
>>> "Fix Version" to  find which release actually fixes the issue. A pass over
>>> open bugs with a Fix Version set to next release (as currently done by a
>>> release manager) helps to make sure that unfixed bugs won't have Fix
>>> Version tag of the upcoming release.
>>> >>> >
>>> >>> >>
>>> >>> >> Kenn
>>> >>> >>
>>> >>> >> On Fri, Oct 25, 2019, 14:16 Robert Bradshaw 
>>> wrote:
>>> >>> >>>
>>> >>> >>> I'm fine with that, but in that case we should have a priority
>>> for
>>> >>> >>> release blockers, below which bugs get automatically bumped to
>>> the
>>> >>> >>> next release (and which becomes the burndown list).
>>> >>> >>>
>>> >>> >>> On Fri, Oct 25, 2019 at 1:58 PM Kenneth Knowles 
>>> wrote:
>>> >>> >>> >
>>> >>> >>> > My takeaway from this thread is that priorities should have a
>>> shared community intuition and/or policy around how they are treated, which
>>> could eventually be formalized into SLOs.
>>> >>> >>> >
>>> >>> >>> > At a practical level, I do think that build breaks are higher
>>> priority than release blockers. If you are on this thread but not looking
>>> at the PR, here is the verbiage I added about urgency:
>>> >>> >>> >
>>> >>> >>> > P0/Blocker: "A P0 issue is more urgent than simply blocking
>>> the next 

Re: [Proposal] Apache Beam Fn API - GCP IO Debuggability Metrics

2020-05-11 Thread Alex Amato
Thanks for the great feedback so far :). I've included many new ideas, and
made some revisions. Both docs have changed a fair bit since the initial
mail out.

https://s.apache.org/beam-gcp-debuggability
https://s.apache.org/beam-histogram-metrics

PTAL and let me know what you think, and hopefully we can resolve major
issues by the end of the week. I'll try to finalize things by then, but of
course always stay open to your great ideas. :)

On Wed, May 6, 2020 at 6:19 PM Alex Amato  wrote:

> Thanks everyone so far for taking a look so far :).
>
> I am hoping to have this finalize the two reviews by the end of next week,
> May 15th.
>
> I'll continue to follow up on feedback and make changes, and I will add
> some more mentions to the documents to draw attention
>
> https://s.apache.org/beam-gcp-debuggability
>  https://s.apache.org/beam-histogram-metrics
>
> On Wed, May 6, 2020 at 10:00 AM Luke Cwik  wrote:
>
>> Thanks, also took a look and left some comments.
>>
>> On Tue, May 5, 2020 at 6:24 PM Alex Amato  wrote:
>>
>>> Hello,
>>>
>>> I created another design document. This time for GCP IO Debuggability
>>> Metrics. Which defines some new metrics to collect in the GCP IO libraries.
>>> This is for monitoring request counts and request latencies.
>>>
>>> Please take a look and let me know what you think:
>>> https://s.apache.org/beam-gcp-debuggability
>>>
>>> I also sent out a separate design yesterday (
>>> https://s.apache.org/beam-histogram-metrics) which is related as this
>>> document uses a Histogram style metric :).
>>>
>>> I would love some feedback to make this feature the best possible :D,
>>> Alex
>>>
>>


Re: Apache Beam application to Season of Docs 2020

2020-05-11 Thread Aizhamal Nurmamat kyzy
PS: I am registered as a mentor too, happy to onboard the tech writers to
The Apache Way and Beam community processes when time comes.

On Mon, May 11, 2020 at 1:52 PM Aizhamal Nurmamat kyzy 
wrote:

> Apache Beam got accepted into Season of Docs, yay! [1]
>
> @Kyle/Pablo, could you please fill out the mentor registration form by
> tomorrow morning [2]?
>
> Is there anyone else interested in becoming a mentor for either of the 2
> projects [3]? The program requires at least two open source mentors for
> each technical writer in case we get 2.
>
> [1] https://developers.google.com/season-of-docs/docs/participants/
> [2]
> https://docs.google.com/forms/d/e/1FAIpQLSfMZ3yCf24PFUbvpSOkVy4sZUJFY5oS7HbGyXTNXzFg2btp4Q/viewform
> [3] https://cwiki.apache.org/confluence/display/BEAM/Google+Season+of+Docs
>
>
> On Tue, May 5, 2020 at 5:31 PM Pablo Estrada  wrote:
>
>> I think I am willing to help with Project 2.
>>
>> On Tue, May 5, 2020 at 2:01 PM Aizhamal Nurmamat kyzy <
>> aizha...@apache.org> wrote:
>>
>>> Thank you, Kyle! really appreciate it! I will add you as a mentor into
>>> the cwiki page, and let you know if Beam gets accepted to the program on
>>> May 11th.
>>>
>>> On Tue, May 5, 2020 at 12:55 PM Kyle Weaver  wrote:
>>>
 Thanks Aizhamal! I would be willing to help with project 1.

 On Mon, May 4, 2020 at 5:04 PM Aizhamal Nurmamat kyzy <
 aizha...@apache.org> wrote:

> Hi all,
>
> I have submitted the application to the Season of Docs program with
> the project ideas we have developed last year [1]. I learnt about its
> deadline a few hours ago and didn't want to miss it.
>
> Feel free to add more project ideas (or edit the current ones) until
> May 7th.
>
> If Beam gets approved, we will get 1 or 2 experienced technical
> writers to help us document community processes or some Beam features. Is
> anyone else willing to mentor for these projects?
>
> [1]
> https://cwiki.apache.org/confluence/display/BEAM/Google+Season+of+Docs
>



Re: Apache Beam application to Season of Docs 2020

2020-05-11 Thread Aizhamal Nurmamat kyzy
Apache Beam got accepted into Season of Docs, yay! [1]

@Kyle/Pablo, could you please fill out the mentor registration form by
tomorrow morning [2]?

Is there anyone else interested in becoming a mentor for either of the 2
projects [3]? The program requires at least two open source mentors for
each technical writer in case we get 2.

[1] https://developers.google.com/season-of-docs/docs/participants/
[2]
https://docs.google.com/forms/d/e/1FAIpQLSfMZ3yCf24PFUbvpSOkVy4sZUJFY5oS7HbGyXTNXzFg2btp4Q/viewform
[3] https://cwiki.apache.org/confluence/display/BEAM/Google+Season+of+Docs


On Tue, May 5, 2020 at 5:31 PM Pablo Estrada  wrote:

> I think I am willing to help with Project 2.
>
> On Tue, May 5, 2020 at 2:01 PM Aizhamal Nurmamat kyzy 
> wrote:
>
>> Thank you, Kyle! really appreciate it! I will add you as a mentor into
>> the cwiki page, and let you know if Beam gets accepted to the program on
>> May 11th.
>>
>> On Tue, May 5, 2020 at 12:55 PM Kyle Weaver  wrote:
>>
>>> Thanks Aizhamal! I would be willing to help with project 1.
>>>
>>> On Mon, May 4, 2020 at 5:04 PM Aizhamal Nurmamat kyzy <
>>> aizha...@apache.org> wrote:
>>>
 Hi all,

 I have submitted the application to the Season of Docs program with the
 project ideas we have developed last year [1]. I learnt about its deadline
 a few hours ago and didn't want to miss it.

 Feel free to add more project ideas (or edit the current ones) until
 May 7th.

 If Beam gets approved, we will get 1 or 2 experienced technical writers
 to help us document community processes or some Beam features. Is anyone
 else willing to mentor for these projects?

 [1]
 https://cwiki.apache.org/confluence/display/BEAM/Google+Season+of+Docs

>>>


Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

2020-05-11 Thread Nam Bui
Updates for today:
- Thanks Brian & Ahmet for your reviews. I left my comments for some of the
questions and also adapted new changes to the reviews [1].
- I see that the new blog post was merged yesterday, so I added it to the
PR as well.

I briefly tried the script from Robert with the input of build files from
old and new websites. It seemed to work well in terms of detecting missing
files (or probably wrong links leading to missing files). I will push
another commit to fix all that up, hope can be tomorrow.

[1] https://github.com/apache/beam/pull/11554#issuecomment-626792031

Best regards,
Nam


On Mon, May 11, 2020 at 9:01 AM Nam Bui  wrote:

> Hi,
>
> @Ahmet: Yeah, it's all clear to me. :)
> @Robert: Thanks for your ideas and also the script. It really helps me to
> serve my works.
>
> Best regard!
>
> On Sat, May 9, 2020 at 2:10 AM Ahmet Altay  wrote:
>
>> This sounds reasonable to me. Thank you. Nam, does it make sense to you?
>>
>> On Fri, May 8, 2020 at 11:53 AM Robert Bradshaw 
>> wrote:
>>
>>> I'd really like to not see this work go to waste, both the original
>>> revision, the further efforts Nam has done in making it more manageable to
>>> review, and the work put into reviewing this so far, so we can get the
>>> benefits of being on Hugo. How about this for a concrete proposal:
>>>
>>> (1) We get "standard" approval from one or more committers for the
>>> infrastructure changes, just as with any other PR. Brian has
>>> already started this, but if others could step up as well that'd be great.
>>>
>>> (2) Reviewers (and authors) typically count on (or request) sufficient
>>> automated test coverage to augment the fact that their eyeballs are
>>> fallible, which is something that is missing here (and given the size of
>>> the change not easily compensated for by a more detailed manual review).
>>> How about we use the script above (or similar) as an automated test to
>>> validate the website's contents haven't (materially) changed. I feel we've
>>> validated enough that the style looks good via spot checking (which is
>>> something that should work on all pages if it works on one). The diff
>>> between the current site and the newly generated site should be empty (it
>>> might already be [1]), or at least we should get a stamp of approval on the
>>> plain-text diff (which should be small), before merging.
>>>
>>> (3) To make things easier, everyone holds off on making any changes to
>>> the old site until a fixed future date (say, next Wednesday). Hopefully we
>>> can get it merged by then. If not, a condition for merging would be a
>>> commitment incorporating new changes after this date.
>>>
>>> Does this sound reasonable?
>>>
>>> - Robert
>>>
>>>
>>>
>>> [1] I'd be curious as to how small the diff already is, but my script
>>> relies on local directories with the generated HTML, which I don't have
>>> handy at the moment.
>>>
>>>
>>>
>>> On Fri, May 8, 2020 at 10:45 AM Robert Bradshaw 
>>> wrote:
>>>
 Here's a script that we could run on the old and new sites that should
 quickly catch any major issues but not get caught up in formatting minutia.



 On Fri, May 8, 2020 at 10:23 AM Robert Bradshaw 
 wrote:

> On Fri, May 8, 2020 at 9:58 AM Aizhamal Nurmamat kyzy <
> aizha...@apache.org> wrote:
>
>> I understand the difficulty, and this certainly comes with lessons
>> learned for future similar projects.
>>
>> To your questions Robert:
>> (1 and 2) I will commit to review the text in the resulting pages. I
>> will try and use some automation to extract visible text from each page 
>> and
>> diff it with the current state of the website. I can do this starting 
>> next
>> week. From some quick research, there seem to be tools that help with 
>> this
>> analysis (
>> https://stackoverflow.com/questions/3286955/compare-two-websites-and-see-if-they-are-equal
>> )
>>
>
> At first glance it looks like these tools would give diffs that are
> *larger* than the 47K one we're struggling to review here.
>
>
>> By remaining in this state, we hold others up from making changes, or
>> we increase the amount of work needed after merging to port over changes
>> that may be missed. If we move forward, new changes can be done on top of
>> the new website.
>>
>
> I agree we don't want to hold others up from making changes. However,
> the amount of work to port changes over seems small in comparison to
> everything else that is being discussed here. (It also provides good
> incentives to reach the bar quickly and has the advantage of falling on 
> the
> right people.) (3) will still take some time.
>
> If we go this route, we're lowering the bar for doc changes, but not
> removing it.
>
>
>> (3) This makes sense. Brian, would you be able to spend some time to
>> look at the automation changes (build files and 

Re: [RESULT][VOTE] Accept the Firefly design donation as Beam Mascot - Deadline Mon April 6

2020-05-11 Thread Aizhamal Nurmamat kyzy
@Ismael, this is in my work items for as soon as we complete the migration
of the website.
@Kyle, thanks for filing Jira, I assigned it to myself.

On Mon, May 11, 2020 at 10:03 AM Kyle Weaver  wrote:

> > Now that the vote has passed maybe we should add the images somewhere
> > in the website so people can easily find the Firefly to use it
>
> +1 Maybe something to revisit after the website overhaul is complete. I
> filed https://jira.apache.org/jira/browse/BEAM-9948 if anyone wants to
> take it.
>
> On Mon, May 11, 2020 at 12:57 PM Ismaël Mejía  wrote:
>
>> Now that the vote has passed maybe we should add the images somewhere
>> in the website so people can easily find the Firefly to use it.
>> Something like what we do with our logos
>> https://beam.apache.org/community/logos/
>>
>> WDYT? any taker?
>>
>> On Tue, Apr 28, 2020 at 7:43 PM Pablo Estrada  wrote:
>> >
>> > I'll be happy to as well!
>> >
>> > On Sun, Apr 26, 2020 at 4:18 AM Maximilian Michels 
>> wrote:
>> >>
>> >> Hey Maria,
>> >>
>> >> I can testify :)
>> >>
>> >> Cheers,
>> >> Max
>> >>
>> >> On 23.04.20 20:49, María Cruz wrote:
>> >> > Hi everyone!
>> >> > It is amazing to see how this process developed to collaboratively
>> >> > create Apache Beam's mascot. Thank you to everyone who got involved!
>> >> > I would like to write a blogpost for the Beam website, and I wanted
>> to
>> >> > ask you: would anyone like to offer their testimony about the
>> process of
>> >> > creating the Beam mascot, and what this means to you? Everyone's
>> >> > testimony is welcome! If you witnessed the development of a mascot
>> for
>> >> > another open source project, even better =)
>> >> >
>> >> > Please feel free to express interest on this thread, and I'll reach
>> out
>> >> > to you off-list.
>> >> >
>> >> > Thanks,
>> >> >
>> >> > María
>> >> >
>> >> > On Fri, Apr 17, 2020 at 6:19 AM Jeff Klukas > >> > > wrote:
>> >> >
>> >> > I personally like the sound of "Datum" as a name. I also like the
>> >> > idea of not assigning them a gender.
>> >> >
>> >> > As a counterpoint on the naming side, one of the slide decks
>> >> > provided while iterating on the design mentions:
>> >> >
>> >> > > Mascot can change colors when it is “full of data” or has a
>> “batch
>> >> > of data” to process.  Yellow is supercharged and ready to
>> process!
>> >> >
>> >> > Based on that, I'd argue that the mascot maps to the concept of a
>> >> > bundle in the beam execution model and we should consider a name
>> >> > that's a play on "bundle" or perhaps a play on "checkpoint".
>> >> >
>> >> > On Thu, Apr 16, 2020 at 3:44 PM Julian Bruno <
>> juliangbr...@gmail.com
>> >> > > wrote:
>> >> >
>> >> > Hi all,
>> >> >
>> >> > While working on the design of our Mascot
>> >> > Some ideas showed up and I wish to share them.
>> >> > In regard to Alex Van Boxel's question about the name of our
>> Mascot.
>> >> >
>> >> > I was thinking about this yesterday night and feel it could
>> be a
>> >> > great idea to name the Mascot "*Data*" or "*Datum*". Both
>> names
>> >> > sound cute and make sense to me. I prefer the later. Datum
>> means
>> >> > a single piece of information. The Mascot is the first piece
>> of
>> >> > information and its job is to collect batches of data and
>> >> > process it. Datum is in charge of linking information
>> together.
>> >> >
>> >> > In addition, our Mascot should have no gender. Rendering it
>> >> > accessible to all users.
>> >> >
>> >> > Beam as a name for the mascot is pretty straight forward but
>> I
>> >> > think there are many things carrying that same name already.
>> >> >
>> >> > What do you think?
>> >> >
>> >> > Looking forward to hearing your feedback. Names are important
>> >> > and I feel it can expand the personality and create a cool
>> >> > background for our Mascot.
>> >> >
>> >> > Cheers!
>> >> >
>> >> > Julian
>> >> >
>> >> > On Mon, Apr 13, 2020, 3:40 PM Kyle Weaver <
>> kcwea...@google.com
>> >> > > wrote:
>> >> >
>> >> > Beam Firefly is fine with me (I guess people tend to
>> forget
>> >> > mascot names anyway). But if anyone comes up with
>> something
>> >> > particularly cute/clever we can consider it.
>> >> >
>> >> > On Mon, Apr 13, 2020 at 6:33 PM Aizhamal Nurmamat kyzy
>> >> > mailto:aizha...@apache.org>>
>> wrote:
>> >> >
>> >> > @Alex, Beam Firefly?
>> >> >
>> >> > On Thu, Apr 9, 2020 at 10:57 PM Alex Van Boxel
>> >> > mailto:a...@vanboxel.be>> wrote:
>> >> >
>> >> > We forgot something
>> >> >
>> >> > ...
>> >> >
>> >> > ...
>> >> >
>> >> > 

Re: [DISCUSS] How many Python 3.x minor versions should Beam Python SDK aim to support concurrently?

2020-05-11 Thread Valentyn Tymofieiev
I agree with the point echoed earlier that the lowest and the highest of
supported versions will probably give the most useful test signal for
possible breakages. So 3.5. and 3.7 as high-priority versions SGTM.

This can change later once Beam drops 3.5 support.

On Mon, May 11, 2020 at 10:05 AM Yoshiki Obata 
wrote:

> Hello again,
>
> Test infrastructure update is ongoing and then we should determine
> which Python versions are high-priority.
>
> According to Pypi downloads stats[1], download proportion of Python
> 3.5 is almost always greater than one of 3.6 and 3.7.
> This situation has not changed since Robert told us Python 3.x
> occupies nearly 40% of downloads[2]
>
> On the other hand, according to docker hub[3],
> apachebeam/python3.x_sdk image downloaded the most is one of Python
> 3.7 which was pointed by Kyle[4].
>
> Considering these stats, I think high-priority versions are 3.5 and 3.7.
>
> Is this assumption appropriate?
> I would like to hear your thoughts about this.
>
> [1] https://pypistats.org/packages/apache-beam
> [2]
> https://lists.apache.org/thread.html/r208c0d11639e790453a17249e511dbfe00a09f91bef8fcd361b4b74a%40%3Cdev.beam.apache.org%3E
> [3] https://hub.docker.com/search?q=apachebeam%2Fpython=image
> [4]
> https://lists.apache.org/thread.html/r9ca9ad316dae3d60a3bf298eedbe4aeecab2b2664454cc352648abc9%40%3Cdev.beam.apache.org%3E
>
> 2020年5月6日(水) 12:48 Yoshiki Obata :
> >
> > > Not sure how run_pylint.sh is related here - we should run linter on
> the entire codebase.
> > ah, I mistyped... I meant run_pytest.sh
> >
> > > I am familiar with beam_PostCommit_PythonXX suites. Is there something
> specific about these suites that you wanted to know?
> > Test suite runtime will depend on the number of  tests in the suite,
> > how many tests we run in parallel, how long they take to run. To
> > understand the load on test infrastructure we can monitor Beam test
> > health metrics [1]. In particular, if time in queue[2] is high, it is
> > a sign that there are not enough Jenkins slots available to start the
> > test suite earlier.
> > Sorry for ambiguous question. I wanted to know how to see the load on
> > test infrastructure.
> > The Grafana links you showed serves my purpose. Thank you.
> >
> > 2020年5月6日(水) 2:35 Valentyn Tymofieiev :
> > >
> > > On Mon, May 4, 2020 at 7:06 PM Yoshiki Obata 
> wrote:
> > >>
> > >> Thank you for comment, Valentyn.
> > >>
> > >> > 1) We can seed the smoke test suite with typehints tests, and add
> more tests later if there is a need. We can identify them by the file path
> or by special attributes in test files. Identifying them using filepath
> seems simpler and independent of test runner.
> > >>
> > >> Yes, making run_pylint.sh allow target test file paths as arguments is
> > >> good way if could.
> > >
> > >
> > > Not sure how run_pylint.sh is related here - we should run linter on
> the entire codebase.
> > >
> > >>
> > >> > 3)  We should reduce the code duplication across
> beam/sdks/python/test-suites/$runner/py3*. I think we could move the suite
> definition into a common file like
> beam/sdks/python/test-suites/$runner/build.gradle perhaps, and populate
> individual suites (beam/sdks/python/test-suites/$runner/py38/build.gradle)
> including the common file and/or logic from PythonNature [1].
> > >>
> > >> Exactly. I'll check it out.
> > >>
> > >> > 4) We have some tests that we run only under specific Python 3
> versions, for example: FlinkValidatesRunner test runs using Python 3.5: [2]
> > >> > HDFS Python 3 tests are running only with Python 3.7 [3].
> Cross-language Py3 tests for Spark are running under Python 3.5[4]: , there
> may be more test suites that selectively use particular versions.
> > >> > We need to correct such suites, so that we do not tie them  to a
> specific Python version. I see several options here: such tests should run
> either for all high-priority versions, or run only under the lowest version
> among the high-priority versions.  We don't have to fix them all at the
> same time. In general, we should try to make it as easy as possible to
> configure, whether a suite runs across all  versions, all high-priority
> versions, or just one version.
> > >>
> > >> The way of high-priority/low-priority configuration would be useful
> for this.
> > >> And which versions to be tested may be related to 5).
> > >>
> > >> > 5) If postcommit suites (that need to run against all versions)
> still constitute too much load on the infrastructure, we may need to
> investigate how to run these suites less frequently.
> > >>
> > >> That's certainly true, beam_PostCommit_PythonXX and
> > >> beam_PostCommit_Python_Chicago_Taxi_(Dataflow|Flink) take about 1
> > >> hour.
> > >> Does anyone have knowledge about this?
> > >
> > >
> > > I am familiar with beam_PostCommit_PythonXX suites. Is there something
> specific about these suites that you wanted to know?
> > > Test suite runtime will depend on the number of  tests in the suite,
> how many tests 

Re: [DISCUSS] How many Python 3.x minor versions should Beam Python SDK aim to support concurrently?

2020-05-11 Thread Kyle Weaver
We've since moved our official Docker images here:
https://hub.docker.com/search?q=apache%2Fbeam_python=image

But Docker downloads are not as representative of actual usage as PyPI.

On Mon, May 11, 2020 at 1:05 PM Yoshiki Obata 
wrote:

> Hello again,
>
> Test infrastructure update is ongoing and then we should determine
> which Python versions are high-priority.
>
> According to Pypi downloads stats[1], download proportion of Python
> 3.5 is almost always greater than one of 3.6 and 3.7.
> This situation has not changed since Robert told us Python 3.x
> occupies nearly 40% of downloads[2]
>
> On the other hand, according to docker hub[3],
> apachebeam/python3.x_sdk image downloaded the most is one of Python
> 3.7 which was pointed by Kyle[4].
>
> Considering these stats, I think high-priority versions are 3.5 and 3.7.
>
> Is this assumption appropriate?
> I would like to hear your thoughts about this.
>
> [1] https://pypistats.org/packages/apache-beam
> [2]
> https://lists.apache.org/thread.html/r208c0d11639e790453a17249e511dbfe00a09f91bef8fcd361b4b74a%40%3Cdev.beam.apache.org%3E
> [3] https://hub.docker.com/search?q=apachebeam%2Fpython=image
> [4]
> https://lists.apache.org/thread.html/r9ca9ad316dae3d60a3bf298eedbe4aeecab2b2664454cc352648abc9%40%3Cdev.beam.apache.org%3E
>
> 2020年5月6日(水) 12:48 Yoshiki Obata :
> >
> > > Not sure how run_pylint.sh is related here - we should run linter on
> the entire codebase.
> > ah, I mistyped... I meant run_pytest.sh
> >
> > > I am familiar with beam_PostCommit_PythonXX suites. Is there something
> specific about these suites that you wanted to know?
> > Test suite runtime will depend on the number of  tests in the suite,
> > how many tests we run in parallel, how long they take to run. To
> > understand the load on test infrastructure we can monitor Beam test
> > health metrics [1]. In particular, if time in queue[2] is high, it is
> > a sign that there are not enough Jenkins slots available to start the
> > test suite earlier.
> > Sorry for ambiguous question. I wanted to know how to see the load on
> > test infrastructure.
> > The Grafana links you showed serves my purpose. Thank you.
> >
> > 2020年5月6日(水) 2:35 Valentyn Tymofieiev :
> > >
> > > On Mon, May 4, 2020 at 7:06 PM Yoshiki Obata 
> wrote:
> > >>
> > >> Thank you for comment, Valentyn.
> > >>
> > >> > 1) We can seed the smoke test suite with typehints tests, and add
> more tests later if there is a need. We can identify them by the file path
> or by special attributes in test files. Identifying them using filepath
> seems simpler and independent of test runner.
> > >>
> > >> Yes, making run_pylint.sh allow target test file paths as arguments is
> > >> good way if could.
> > >
> > >
> > > Not sure how run_pylint.sh is related here - we should run linter on
> the entire codebase.
> > >
> > >>
> > >> > 3)  We should reduce the code duplication across
> beam/sdks/python/test-suites/$runner/py3*. I think we could move the suite
> definition into a common file like
> beam/sdks/python/test-suites/$runner/build.gradle perhaps, and populate
> individual suites (beam/sdks/python/test-suites/$runner/py38/build.gradle)
> including the common file and/or logic from PythonNature [1].
> > >>
> > >> Exactly. I'll check it out.
> > >>
> > >> > 4) We have some tests that we run only under specific Python 3
> versions, for example: FlinkValidatesRunner test runs using Python 3.5: [2]
> > >> > HDFS Python 3 tests are running only with Python 3.7 [3].
> Cross-language Py3 tests for Spark are running under Python 3.5[4]: , there
> may be more test suites that selectively use particular versions.
> > >> > We need to correct such suites, so that we do not tie them  to a
> specific Python version. I see several options here: such tests should run
> either for all high-priority versions, or run only under the lowest version
> among the high-priority versions.  We don't have to fix them all at the
> same time. In general, we should try to make it as easy as possible to
> configure, whether a suite runs across all  versions, all high-priority
> versions, or just one version.
> > >>
> > >> The way of high-priority/low-priority configuration would be useful
> for this.
> > >> And which versions to be tested may be related to 5).
> > >>
> > >> > 5) If postcommit suites (that need to run against all versions)
> still constitute too much load on the infrastructure, we may need to
> investigate how to run these suites less frequently.
> > >>
> > >> That's certainly true, beam_PostCommit_PythonXX and
> > >> beam_PostCommit_Python_Chicago_Taxi_(Dataflow|Flink) take about 1
> > >> hour.
> > >> Does anyone have knowledge about this?
> > >
> > >
> > > I am familiar with beam_PostCommit_PythonXX suites. Is there something
> specific about these suites that you wanted to know?
> > > Test suite runtime will depend on the number of  tests in the suite,
> how many tests we run in parallel, how long they take to run. To understand
> the load on test 

Re: [DISCUSS] How many Python 3.x minor versions should Beam Python SDK aim to support concurrently?

2020-05-11 Thread Yoshiki Obata
Hello again,

Test infrastructure update is ongoing and then we should determine
which Python versions are high-priority.

According to Pypi downloads stats[1], download proportion of Python
3.5 is almost always greater than one of 3.6 and 3.7.
This situation has not changed since Robert told us Python 3.x
occupies nearly 40% of downloads[2]

On the other hand, according to docker hub[3],
apachebeam/python3.x_sdk image downloaded the most is one of Python
3.7 which was pointed by Kyle[4].

Considering these stats, I think high-priority versions are 3.5 and 3.7.

Is this assumption appropriate?
I would like to hear your thoughts about this.

[1] https://pypistats.org/packages/apache-beam
[2] 
https://lists.apache.org/thread.html/r208c0d11639e790453a17249e511dbfe00a09f91bef8fcd361b4b74a%40%3Cdev.beam.apache.org%3E
[3] https://hub.docker.com/search?q=apachebeam%2Fpython=image
[4] 
https://lists.apache.org/thread.html/r9ca9ad316dae3d60a3bf298eedbe4aeecab2b2664454cc352648abc9%40%3Cdev.beam.apache.org%3E

2020年5月6日(水) 12:48 Yoshiki Obata :
>
> > Not sure how run_pylint.sh is related here - we should run linter on the 
> > entire codebase.
> ah, I mistyped... I meant run_pytest.sh
>
> > I am familiar with beam_PostCommit_PythonXX suites. Is there something 
> > specific about these suites that you wanted to know?
> Test suite runtime will depend on the number of  tests in the suite,
> how many tests we run in parallel, how long they take to run. To
> understand the load on test infrastructure we can monitor Beam test
> health metrics [1]. In particular, if time in queue[2] is high, it is
> a sign that there are not enough Jenkins slots available to start the
> test suite earlier.
> Sorry for ambiguous question. I wanted to know how to see the load on
> test infrastructure.
> The Grafana links you showed serves my purpose. Thank you.
>
> 2020年5月6日(水) 2:35 Valentyn Tymofieiev :
> >
> > On Mon, May 4, 2020 at 7:06 PM Yoshiki Obata  
> > wrote:
> >>
> >> Thank you for comment, Valentyn.
> >>
> >> > 1) We can seed the smoke test suite with typehints tests, and add more 
> >> > tests later if there is a need. We can identify them by the file path or 
> >> > by special attributes in test files. Identifying them using filepath 
> >> > seems simpler and independent of test runner.
> >>
> >> Yes, making run_pylint.sh allow target test file paths as arguments is
> >> good way if could.
> >
> >
> > Not sure how run_pylint.sh is related here - we should run linter on the 
> > entire codebase.
> >
> >>
> >> > 3)  We should reduce the code duplication across  
> >> > beam/sdks/python/test-suites/$runner/py3*. I think we could move the 
> >> > suite definition into a common file like 
> >> > beam/sdks/python/test-suites/$runner/build.gradle perhaps, and populate 
> >> > individual suites 
> >> > (beam/sdks/python/test-suites/$runner/py38/build.gradle) including the 
> >> > common file and/or logic from PythonNature [1].
> >>
> >> Exactly. I'll check it out.
> >>
> >> > 4) We have some tests that we run only under specific Python 3 versions, 
> >> > for example: FlinkValidatesRunner test runs using Python 3.5: [2]
> >> > HDFS Python 3 tests are running only with Python 3.7 [3]. Cross-language 
> >> > Py3 tests for Spark are running under Python 3.5[4]: , there may be more 
> >> > test suites that selectively use particular versions.
> >> > We need to correct such suites, so that we do not tie them  to a 
> >> > specific Python version. I see several options here: such tests should 
> >> > run either for all high-priority versions, or run only under the lowest 
> >> > version among the high-priority versions.  We don't have to fix them all 
> >> > at the same time. In general, we should try to make it as easy as 
> >> > possible to configure, whether a suite runs across all  versions, all 
> >> > high-priority versions, or just one version.
> >>
> >> The way of high-priority/low-priority configuration would be useful for 
> >> this.
> >> And which versions to be tested may be related to 5).
> >>
> >> > 5) If postcommit suites (that need to run against all versions) still 
> >> > constitute too much load on the infrastructure, we may need to 
> >> > investigate how to run these suites less frequently.
> >>
> >> That's certainly true, beam_PostCommit_PythonXX and
> >> beam_PostCommit_Python_Chicago_Taxi_(Dataflow|Flink) take about 1
> >> hour.
> >> Does anyone have knowledge about this?
> >
> >
> > I am familiar with beam_PostCommit_PythonXX suites. Is there something 
> > specific about these suites that you wanted to know?
> > Test suite runtime will depend on the number of  tests in the suite, how 
> > many tests we run in parallel, how long they take to run. To understand the 
> > load on test infrastructure we can monitor Beam test health metrics [1]. In 
> > particular, if time in queue[2] is high, it is a sign that there are not 
> > enough Jenkins slots available to start the test suite earlier.
> >
> > [1] 

Re: Greetings from Borzoo

2020-05-11 Thread Borzoo Esmailloo
Thank you :)

On Mon, May 11, 2020 at 6:30 PM Luke Cwik  wrote:

> Welcome, I have added you as a JIRA contributor.
>
> The Apache Beam contribution guide[1] is a good starting point and/or the
> starter JIRAs[2].
>
> 1: https://beam.apache.org/contribute/
> 2: https://s.apache.org/beam-starter-tasks
>
> On Sat, May 9, 2020 at 7:55 AM Borzoo Esmailloo <
> borzoo.esmail...@gmail.com> wrote:
>
>> Hello,
>>
>> I want to start contributing to Beam. It would be great if someone could
>> add me to JIRA as a contributor.
>>
>> JIRA username: "brz"
>>
>> Best,
>> Borzoo
>>
>


Re: [RESULT][VOTE] Accept the Firefly design donation as Beam Mascot - Deadline Mon April 6

2020-05-11 Thread Kyle Weaver
> Now that the vote has passed maybe we should add the images somewhere
> in the website so people can easily find the Firefly to use it

+1 Maybe something to revisit after the website overhaul is complete. I
filed https://jira.apache.org/jira/browse/BEAM-9948 if anyone wants to take
it.

On Mon, May 11, 2020 at 12:57 PM Ismaël Mejía  wrote:

> Now that the vote has passed maybe we should add the images somewhere
> in the website so people can easily find the Firefly to use it.
> Something like what we do with our logos
> https://beam.apache.org/community/logos/
>
> WDYT? any taker?
>
> On Tue, Apr 28, 2020 at 7:43 PM Pablo Estrada  wrote:
> >
> > I'll be happy to as well!
> >
> > On Sun, Apr 26, 2020 at 4:18 AM Maximilian Michels 
> wrote:
> >>
> >> Hey Maria,
> >>
> >> I can testify :)
> >>
> >> Cheers,
> >> Max
> >>
> >> On 23.04.20 20:49, María Cruz wrote:
> >> > Hi everyone!
> >> > It is amazing to see how this process developed to collaboratively
> >> > create Apache Beam's mascot. Thank you to everyone who got involved!
> >> > I would like to write a blogpost for the Beam website, and I wanted to
> >> > ask you: would anyone like to offer their testimony about the process
> of
> >> > creating the Beam mascot, and what this means to you? Everyone's
> >> > testimony is welcome! If you witnessed the development of a mascot for
> >> > another open source project, even better =)
> >> >
> >> > Please feel free to express interest on this thread, and I'll reach
> out
> >> > to you off-list.
> >> >
> >> > Thanks,
> >> >
> >> > María
> >> >
> >> > On Fri, Apr 17, 2020 at 6:19 AM Jeff Klukas  >> > > wrote:
> >> >
> >> > I personally like the sound of "Datum" as a name. I also like the
> >> > idea of not assigning them a gender.
> >> >
> >> > As a counterpoint on the naming side, one of the slide decks
> >> > provided while iterating on the design mentions:
> >> >
> >> > > Mascot can change colors when it is “full of data” or has a
> “batch
> >> > of data” to process.  Yellow is supercharged and ready to process!
> >> >
> >> > Based on that, I'd argue that the mascot maps to the concept of a
> >> > bundle in the beam execution model and we should consider a name
> >> > that's a play on "bundle" or perhaps a play on "checkpoint".
> >> >
> >> > On Thu, Apr 16, 2020 at 3:44 PM Julian Bruno <
> juliangbr...@gmail.com
> >> > > wrote:
> >> >
> >> > Hi all,
> >> >
> >> > While working on the design of our Mascot
> >> > Some ideas showed up and I wish to share them.
> >> > In regard to Alex Van Boxel's question about the name of our
> Mascot.
> >> >
> >> > I was thinking about this yesterday night and feel it could
> be a
> >> > great idea to name the Mascot "*Data*" or "*Datum*". Both
> names
> >> > sound cute and make sense to me. I prefer the later. Datum
> means
> >> > a single piece of information. The Mascot is the first piece
> of
> >> > information and its job is to collect batches of data and
> >> > process it. Datum is in charge of linking information
> together.
> >> >
> >> > In addition, our Mascot should have no gender. Rendering it
> >> > accessible to all users.
> >> >
> >> > Beam as a name for the mascot is pretty straight forward but I
> >> > think there are many things carrying that same name already.
> >> >
> >> > What do you think?
> >> >
> >> > Looking forward to hearing your feedback. Names are important
> >> > and I feel it can expand the personality and create a cool
> >> > background for our Mascot.
> >> >
> >> > Cheers!
> >> >
> >> > Julian
> >> >
> >> > On Mon, Apr 13, 2020, 3:40 PM Kyle Weaver <
> kcwea...@google.com
> >> > > wrote:
> >> >
> >> > Beam Firefly is fine with me (I guess people tend to
> forget
> >> > mascot names anyway). But if anyone comes up with
> something
> >> > particularly cute/clever we can consider it.
> >> >
> >> > On Mon, Apr 13, 2020 at 6:33 PM Aizhamal Nurmamat kyzy
> >> > mailto:aizha...@apache.org>> wrote:
> >> >
> >> > @Alex, Beam Firefly?
> >> >
> >> > On Thu, Apr 9, 2020 at 10:57 PM Alex Van Boxel
> >> > mailto:a...@vanboxel.be>> wrote:
> >> >
> >> > We forgot something
> >> >
> >> > ...
> >> >
> >> > ...
> >> >
> >> > it/she/he needs a *name*!
> >> >
> >> >
> >> >  _/
> >> > _/ Alex Van Boxel
> >> >
> >> >
> >> > On Fri, Apr 10, 2020 at 6:19 AM Kenneth Knowles
> >> > mailto:k...@apache.org>> wrote:
> >> >
> >> > Looking forward to the guide. I enjoy doing
> >> >   

Re: [RESULT][VOTE] Accept the Firefly design donation as Beam Mascot - Deadline Mon April 6

2020-05-11 Thread Ismaël Mejía
Now that the vote has passed maybe we should add the images somewhere
in the website so people can easily find the Firefly to use it.
Something like what we do with our logos
https://beam.apache.org/community/logos/

WDYT? any taker?

On Tue, Apr 28, 2020 at 7:43 PM Pablo Estrada  wrote:
>
> I'll be happy to as well!
>
> On Sun, Apr 26, 2020 at 4:18 AM Maximilian Michels  wrote:
>>
>> Hey Maria,
>>
>> I can testify :)
>>
>> Cheers,
>> Max
>>
>> On 23.04.20 20:49, María Cruz wrote:
>> > Hi everyone!
>> > It is amazing to see how this process developed to collaboratively
>> > create Apache Beam's mascot. Thank you to everyone who got involved!
>> > I would like to write a blogpost for the Beam website, and I wanted to
>> > ask you: would anyone like to offer their testimony about the process of
>> > creating the Beam mascot, and what this means to you? Everyone's
>> > testimony is welcome! If you witnessed the development of a mascot for
>> > another open source project, even better =)
>> >
>> > Please feel free to express interest on this thread, and I'll reach out
>> > to you off-list.
>> >
>> > Thanks,
>> >
>> > María
>> >
>> > On Fri, Apr 17, 2020 at 6:19 AM Jeff Klukas > > > wrote:
>> >
>> > I personally like the sound of "Datum" as a name. I also like the
>> > idea of not assigning them a gender.
>> >
>> > As a counterpoint on the naming side, one of the slide decks
>> > provided while iterating on the design mentions:
>> >
>> > > Mascot can change colors when it is “full of data” or has a “batch
>> > of data” to process.  Yellow is supercharged and ready to process!
>> >
>> > Based on that, I'd argue that the mascot maps to the concept of a
>> > bundle in the beam execution model and we should consider a name
>> > that's a play on "bundle" or perhaps a play on "checkpoint".
>> >
>> > On Thu, Apr 16, 2020 at 3:44 PM Julian Bruno > > > wrote:
>> >
>> > Hi all,
>> >
>> > While working on the design of our Mascot
>> > Some ideas showed up and I wish to share them.
>> > In regard to Alex Van Boxel's question about the name of our 
>> > Mascot.
>> >
>> > I was thinking about this yesterday night and feel it could be a
>> > great idea to name the Mascot "*Data*" or "*Datum*". Both names
>> > sound cute and make sense to me. I prefer the later. Datum means
>> > a single piece of information. The Mascot is the first piece of
>> > information and its job is to collect batches of data and
>> > process it. Datum is in charge of linking information together.
>> >
>> > In addition, our Mascot should have no gender. Rendering it
>> > accessible to all users.
>> >
>> > Beam as a name for the mascot is pretty straight forward but I
>> > think there are many things carrying that same name already.
>> >
>> > What do you think?
>> >
>> > Looking forward to hearing your feedback. Names are important
>> > and I feel it can expand the personality and create a cool
>> > background for our Mascot.
>> >
>> > Cheers!
>> >
>> > Julian
>> >
>> > On Mon, Apr 13, 2020, 3:40 PM Kyle Weaver > > > wrote:
>> >
>> > Beam Firefly is fine with me (I guess people tend to forget
>> > mascot names anyway). But if anyone comes up with something
>> > particularly cute/clever we can consider it.
>> >
>> > On Mon, Apr 13, 2020 at 6:33 PM Aizhamal Nurmamat kyzy
>> > mailto:aizha...@apache.org>> wrote:
>> >
>> > @Alex, Beam Firefly?
>> >
>> > On Thu, Apr 9, 2020 at 10:57 PM Alex Van Boxel
>> > mailto:a...@vanboxel.be>> wrote:
>> >
>> > We forgot something
>> >
>> > ...
>> >
>> > ...
>> >
>> > it/she/he needs a *name*!
>> >
>> >
>> >  _/
>> > _/ Alex Van Boxel
>> >
>> >
>> > On Fri, Apr 10, 2020 at 6:19 AM Kenneth Knowles
>> > mailto:k...@apache.org>> wrote:
>> >
>> > Looking forward to the guide. I enjoy doing
>> > (bad) drawings as a way to relax. And I want
>> > them to be properly on brand :-)
>> >
>> > Kenn
>> >
>> > On Thu, Apr 9, 2020 at 10:35 AM Maximilian
>> > Michels mailto:m...@apache.org>>
>> > wrote:
>> >
>> > Awesome. What a milestone! The mascot is a
>> > real eye catcher. Thank you
>> > Julian and Aizhamal for making it happen.
>> >
>> > On 06.04.20 22:05, Aizhamal Nurmamat kyzy 
>> > 

Re: Support for AWS SDK v2 and enhanced fanout in KinesisIO

2020-05-11 Thread Alexey Romanenko


Thanks Ismaël for recalling of this thread, I think we should start to take 
some efforts to deprecate the AWS SDK V1 IOs that are already implemeneted 
using V2 (if there are no other objections). In this case, it would make sense 
to abstract some common code ONLY if ,for some reasons, we would like to keep 
the both versions of IO (which is not a case for now, afaik).

> On 8 May 2020, at 00:03, Ismaël Mejía  wrote:
> 
> Achieving good abstractions will prove elusive since the APIs differ
> and we will end up with a ton of extra maintenance work that should be
> not Beam's responsibility. I know that similar code (almost copy
> pasteable) is not nice to have but we should consider this as a
> temporary measure and probably try to make deprecation (and possible
> removal) of older versions as soon as possible.
> Please remember that this was already discussed in the past [1] and
> the soft consensus was around rapid deprecation AWS SDK v1 IOs already
> available in their v2 version and only do improvements on v1 IOs for
> security related issues and dependency updates. This has not happened
> yet but the only missing thing is someone to take action.
> 
> +cammac...@gmail.com would you have to me to start rolling this plan?
> otherwise someone else can jump and do it too.
> 
> [1] 
> https://lists.apache.org/thread.html/130cb60e6bcdd58c5afdd0c375663eaf05e705aab9ee0196535cd17f%40%3Cdev.beam.apache.org%3E
> 
> On Thu, May 7, 2020 at 11:05 PM Luke Cwik  wrote:
>> 
>> I think you should try and share as much as is reasonable. Using what is 
>> shared between AWS V1 and V2 SDKs would be a good signal as to what should 
>> be shared in Beam. There might be some places where a trivial wrapper could 
>> help but I wouldn't try to create a bunch of grand abstractions that fit 
>> both versions of the AWS SDKs.
>> 
>> On Wed, May 6, 2020 at 9:59 AM Alexey Romanenko  
>> wrote:
>>> 
>>> I’d like to get back to this question, fairly raised by Jonothan a while 
>>> ago, since actually it affects not only KinesisIO but also all other AWS 
>>> IOs in general, that uses AWS SDK of two different versions.
>>> 
>>> My personal opinion - I’d strongly like to avoid a copy-pasted code for two 
>>> different versions of the same IO (and double support of this), that is 
>>> SDK-independent. In this case, if we wish too provide IO for two AWS SDK 
>>> versions, we have to extract this core logic into one common and 
>>> SDK-independent framework that could used by any of SDK versions then.
>>> 
>>> In the same time, if it will require a lot of efforts to achieve that, then 
>>> probably we need to decide to leave only one IO based on single AWS SDK 
>>> version (I guess it will be V2 as more modern)  and deprecate old version 
>>> (and then remove it to avoid a confusion for users).
>>> 
>>> What others think about that?
>>> 
>>> On 18 Apr 2020, at 01:48, Jonothan Farr  wrote:
>>> 
>>> Hi, I wanted to try out KinesisIO with enhanced fanout for a PoC I was 
>>> working on, so I ended up submitting that as 
>>> https://github.com/apache/beam/pull/9899.
>>> 
>>> But since that still needs work to be fully functional (right now it does 
>>> not handle resharding), I figured I could at least contribute the updates 
>>> to make KinesisIO compatible with the AWS SDK v2, so I split that out and 
>>> submitted it as: https://github.com/apache/beam/pull/11318.
>>> 
>>> Now some totally reasonable concerns have been raised about maintaining two 
>>> separate codebases, and having to dual-commit every bugfix or feature 
>>> enhancement. That seems to already be the case for all of the other AWS IOs 
>>> in amazon-web-services2 (dynamodb, sns, sqs), but do we want to continue 
>>> that trend? What do members of the community think?
>>> 
>>> One solution is obviously to rewrite the AWS IOs with an added abstraction 
>>> layer so that the SDK-specific code can be factored out. However, the 
>>> KinesisIO class itself already has dependencies on the v1 AWS SDK so I'm 
>>> not aware of a way that this could be done so that it's 
>>> backwards-compatible. This means that everyone currently using KinesisIO 
>>> would most likely have to update their code to use the new interface 
>>> whether they wanted to use the new v2 SDK or continue to use v1. I think a 
>>> rewrite would also be a non-trivial task with more than a few details to be 
>>> worked out first.
>>> 
>>> From my perspective, the options are:
>>> 
>>> - Merge #11318 now and the community can use the v2 AWS SDK with KinesisIO 
>>> and I can continue working on enhanced fanout while a rewrite is being 
>>> worked on, but the community now has to dual-commit any changes to 
>>> KinesisIO.
>>> - Close #11318 and continue working on #9899 and I alone have to 
>>> dual-commit the changes until it can be merged, at which point everyone can 
>>> use enhanced fanout while the rewrite is being worked on. Then the 
>>> community will also have to dual-commit but only after enhanced fanout is 

Re: Greetings from Borzoo

2020-05-11 Thread Luke Cwik
Welcome, I have added you as a JIRA contributor.

The Apache Beam contribution guide[1] is a good starting point and/or the
starter JIRAs[2].

1: https://beam.apache.org/contribute/
2: https://s.apache.org/beam-starter-tasks

On Sat, May 9, 2020 at 7:55 AM Borzoo Esmailloo 
wrote:

> Hello,
>
> I want to start contributing to Beam. It would be great if someone could
> add me to JIRA as a contributor.
>
> JIRA username: "brz"
>
> Best,
> Borzoo
>


Re: [DISCUSS] Dealing with @Ignored tests

2020-05-11 Thread Luke Cwik
Deleting ignored tests does lead us to losing the reason as to why the test
case was around so I would rather keep it around. I think it would be more
valuable to generate a report that goes on the website/wiki showing
stability of the modules (num tests, num passed, num skipped, num failed
(running averages over the past N runs)). We had discussed doing something
like this for ValidatesRunner so we could show which runner supports what
automatically.

On Mon, May 11, 2020 at 12:53 AM Jan Lukavský  wrote:

> I think that we do have Jira issues for ignored test, there should be no
> problem with that. The questionable point is that when test gets Ignored,
> people might consider the problem as "less painful" and postpone the
> correct solution until ... forever. I'd just like to discuss if people see
> this as an issue. If yes, should we do something about that, or if no,
> maybe we can create a rule that test marked as Ignored for long time might
> be deleted, because apparently is only a dead code.
> On 5/6/20 6:30 PM, Kenneth Knowles wrote:
>
> Good point.
>
> The raw numbers are available in the test run output. See
> https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PreCommit_Java_Cron/2718/testReport/
>  for
> the "skipped" column.
> And you get the same on console or Gradle Scan:
> https://scans.gradle.com/s/ml3jv5xctkrmg/tests?collapse-all
> This would be good to review periodically for obvious trouble spots.
>
> But I think you mean something more detailed. Some report with columns:
> Test Suite, Test Method, Jira, Date Ignored, Most Recent Update
>
> I think we can get most of this from Jira, if we just make sure that each
> ignored test has a Jira and they are all labeled in a consistent way. That
> would be the quickest way to get some result, even though it is not
> perfectly automated and audited.
>
> Kenn
>
> On Tue, May 5, 2020 at 2:41 PM Jan Lukavský  wrote:
>
>> Hi,
>>
>> it seems we are accumulating test cases (see discussion in [1]) that are
>> marked as @Ignored (mostly due to flakiness), which is generally
>> undesirable. Associated JIRAs seem to be open for a long time, and this
>> might generally cause that we loose code coverage. Would anyone have
>> idea on how to visualize these Ignored tests better? My first idea would
>> be something similar to "Beam dependency check report", but that seems
>> to be not the best example (which is completely different issue :)).
>>
>> Jan
>>
>> [1] https://github.com/apache/beam/pull/11614
>>
>>


Re: Beam 2.21 release update

2020-05-11 Thread Luke Cwik
Thanks for the fix Max.

On Mon, May 11, 2020 at 5:46 AM Maximilian Michels  wrote:

> FYI I've created this issue and marked it as a blocker:
> https://jira.apache.org/jira/browse/BEAM-9947
>
> Essentially, the timer encoding is broken for all non-standard key
> coders. The fix can be found here:
> https://github.com/apache/beam/pull/11658
>
> -Max
>
> On 08.05.20 18:53, Udi Meiri wrote:
> > +Chad Dombrova  , who added
> _find_protoc_gen_mypy.
> >
> > I'm guessing that the code
> > in _install_grpcio_tools_and_generate_proto_files creates a kind of
> > virtualenv, but it only works well for staging Python modules and not
> > binaries like protoc-gen-mypy.
> > (I assume there's a reason why it doesn't invoke virtualenv, probably
> > since the list of things setup.py can expect to be installed is very
> > minimal (setuptools).)
> >
> > One solution would be to make these setup.py dependencies explicit in
> > pyproject.toml, such that pip installs them before running
> > setup.py:
> https://pip.pypa.io/en/stable/reference/pip/#pep-517-and-518-support
> > It would help when using tools like pip ("pip wheel"), but I'm not sure
> > what the alternative for "python setup.py sdist" is.
> >
> >
> > On Thu, May 7, 2020 at 10:40 PM Thomas Weise  > > wrote:
> >
> > No additional stacktraces. Full error output below.
> >
> > It's not clear what is going wrong.
> >
> > There isn't any exception from the subprocess execution since the
> > "WARNING:root:Installing grpcio-tools took 305.39 seconds." is
> printed.
> >
> > Also, the time it takes to perform the install is equivalent to
> > successfully running the pip command.
> >
> > I will report back if I find anything else. Currently doing the
> > explicit install via pip install -r
> sdks/python/build-requirements.txt
> >
> > Thanks,
> > Thomas
> >
> > WARNING:root:Installing grpcio-tools took 269.27 seconds.
> > INFO:gen_protos:Regenerating Python proto definitions (no output
> files).
> > Process Process-1:
> > Traceback (most recent call last):
> >   File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in
> > _bootstrap
> > self.run()
> >   File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in
> run
> > self._target(*self._args, **self._kwargs)
> >   File
> >
>  "/src/streamingplatform/beam-release/beam/sdks/python/gen_protos.py", line
> > 378, in _install_grpcio_tools_and_generate_proto_files
> > generate_proto_files(force=force)
> >   File
> >
>  "/src/streamingplatform/beam-release/beam/sdks/python/gen_protos.py", line
> > 315, in generate_proto_files
> > protoc_gen_mypy = _find_protoc_gen_mypy()
> >   File
> >
>  "/src/streamingplatform/beam-release/beam/sdks/python/gen_protos.py", line
> > 233, in _find_protoc_gen_mypy
> > (fname, ', '.join(search_paths)))
> > RuntimeError: Could not find protoc-gen-mypy in
> > /code/venvs/venv2/bin, /code/venvs/venv2/bin, /code/venvs/venv3/bin,
> > /usr/local/sbin, /usr/local/bin, /usr/sbin, /usr/bin, /sbin, /bin
> > Traceback (most recent call last):
> >   File "setup.py", line 311, in 
> > 'mypy': generate_protos_first(mypy),
> >   File
> >
>  "/code/venvs/venv2/local/lib/python2.7/site-packages/setuptools/__init__.py",
> > line 129, in setup
> > return distutils.core.setup(**attrs)
> >   File "/usr/lib/python2.7/distutils/core.py", line 151, in setup
> > dist.run_commands()
> >   File "/usr/lib/python2.7/distutils/dist.py", line 953, in
> run_commands
> > self.run_command(cmd)
> >   File "/usr/lib/python2.7/distutils/dist.py", line 972, in
> run_command
> > cmd_obj.run()
> >   File
> >
>  "/code/venvs/venv2/local/lib/python2.7/site-packages/wheel/bdist_wheel.py",
> > line 204, in run
> > self.run_command('build')
> >   File "/usr/lib/python2.7/distutils/cmd.py", line 326, in
> run_command
> > self.distribution.run_command(command)
> >   File "/usr/lib/python2.7/distutils/dist.py", line 972, in
> run_command
> > cmd_obj.run()
> >   File "/usr/lib/python2.7/distutils/command/build.py", line 128, in
> run
> > self.run_command(cmd_name)
> >   File "/usr/lib/python2.7/distutils/cmd.py", line 326, in
> run_command
> > self.distribution.run_command(command)
> >   File "/usr/lib/python2.7/distutils/dist.py", line 972, in
> run_command
> > cmd_obj.run()
> >   File "setup.py", line 235, in run
> > gen_protos.generate_proto_files()
> >   File
> >
>  "/src/streamingplatform/beam-release/beam/sdks/python/gen_protos.py", line
> > 310, in generate_proto_files
> > raise ValueError("Proto generation failed (see log for
> details).")
> > ValueError: Proto generation failed (see log for details).
> >
> >
> > On Thu, May 7, 2020 at 2:25 PM Udi Meiri  > 

RE: Unit-testing BEAM pipelines with PROCESSING_TIME timers

2020-05-11 Thread Robert.Butcher
Many thanks, Darshan!  I’ve made equivalent changes to my test case and it’s 
working fine.

Kind regards,

Rob

From: Darshan Jani [mailto:darshanjani...@gmail.com]
Sent: 11 May 2020 15:08
To: dev@beam.apache.org
Subject: Re: Unit-testing BEAM pipelines with PROCESSING_TIME timers


*
"This is an external email. Do you know who has sent it? Can you be sure that 
any links and attachments contained within it are safe? If in any doubt, use 
the Phishing Reporter Button in your Outlook client or forward the email as an 
attachment to ~ I've Been Phished"
*
Hi Robert,

I found this sample test with Timer on processing time.
From the error, I assume there may be is a problem what are you asserting in 
your PAssert.

https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L3633-L3665

I ran it locally and it runs fine.

-Regards
Darshan

On Mon, May 11, 2020 at 5:28 PM 
mailto:robert.butc...@natwestmarkets.com>> 
wrote:
I have a BEAM DoFn that I’m attempting to unit test.  It involves using a timer 
based on processing time and I’ve not managed to get it to fire.  The relevant 
code excerpts are as follows:

@TimerId("timer")
private final TimerSpec timer = TimerSpecs.timer(TimeDomain.PROCESSING_TIME);


@ProcessElement
public void process(@TimerId("timer") Timer timer) {
// Set a processing time timer to fire in 5 seconds so we can poll BigQuery
timer.offset(Duration.standardSeconds(5)).setRelative();
}


@OnTimer("timer")
public void onTimer() {
System.out.println("In onTimer");

When I use a TestPipeline with an appropriate PAssert, it always results in the 
following exception:

org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
java.util.NoSuchElementException

  at 
org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:348)
  at 
org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:318)
  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:213)
  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:67)
  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317)
  at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350)
  at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331)
  at 
com.nwm.foundry.atomic.AtomicCommitFnTest.shouldGenerateCorrectEvent(AtomicCommitFnTest.java:28)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
  at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
  at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
  at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
  at 
org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
  at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
  at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
  at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
  at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
  at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
  at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
  at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
  at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
  at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
  at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
  at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
  at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
  at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
  at 
com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
  at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
  at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
Caused by: java.util.NoSuchElementException
  at java.util.ArrayList$Itr.next(ArrayList.java:862)
  at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterators.getOnlyElement(Iterators.java:302)
  at 

Re: Unit-testing BEAM pipelines with PROCESSING_TIME timers

2020-05-11 Thread Darshan Jani
Hi Robert,

I found this sample test with Timer on processing time.
>From the error, I assume there may be is a problem what are you asserting
in your PAssert.

https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L3633-L3665

I ran it locally and it runs fine.

-Regards
Darshan

On Mon, May 11, 2020 at 5:28 PM  wrote:

> I have a BEAM DoFn that I’m attempting to unit test.  It involves using a
> timer based on *processing time* and I’ve not managed to get it to fire.
> The relevant code excerpts are as follows:
>
>
>
> @TimerId("timer")
> private final TimerSpec timer = TimerSpecs.*timer*(TimeDomain.
> *PROCESSING_TIME*);
>
>
>
> @ProcessElement
> public void process(@TimerId("timer") Timer timer) {
> // Set a processing time timer to fire in 5 seconds so we can poll 
> BigQuery
> timer.offset(Duration.*standardSeconds*(5)).setRelative();
> }
>
>
>
> @OnTimer("timer")
> public void onTimer() {
> System.*out*.println("In onTimer");
>
>
>
> When I use a TestPipeline with an appropriate PAssert, it always results
> in the following exception:
>
>
>
> org.apache.beam.sdk.Pipeline$PipelineExecutionException:
> java.util.NoSuchElementException
>
>
>
>   at
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:348)
>
>   at
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:318)
>
>   at
> org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:213)
>
>   at
> org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:67)
>
>   at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317)
>
>   at
> org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350)
>
>   at
> org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331)
>
>   at
> com.nwm.foundry.atomic.AtomicCommitFnTest.shouldGenerateCorrectEvent(AtomicCommitFnTest.java:28)
>
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>   at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
>   at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>   at java.lang.reflect.Method.invoke(Method.java:498)
>
>   at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>
>   at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>
>   at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>
>   at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>
>   at
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
>
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>
>   at
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>
>   at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>
>   at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>
>   at
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>
>   at
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>
>   at
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>
>   at
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
>
> Caused by: java.util.NoSuchElementException
>
>   at java.util.ArrayList$Itr.next(ArrayList.java:862)
>
>   at
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterators.getOnlyElement(Iterators.java:302)
>
>   at
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables.getOnlyElement(Iterables.java:254)
>
>   at
> org.apache.beam.sdk.testing.PAssert$SingletonCheckerDoFn.processElement(PAssert.java:1417)
>
>
>
> Swapping the timer for an EVENT_TIME timer works fine.
>
>
>
> Is there a trick I’m missing here?
>
>
>
> Kind regards,
>
>
>
> Rob
>
>
>
>
>
> *Robert Butcher*
>
> *Technical Architect | Foundry/SRS | NatWest Markets*
>
> WeWork, 10 Devonshire Square, London, 

Re: Jenkins jobs not running for my PR 10438

2020-05-11 Thread Yoshiki Obata
Thank you Hannah!

And could anyone trigger these checks again in
https://github.com/apache/beam/pull/11656 ?

Run Portable_Python PreCommit
Run Python 3.5 PostCommit
Run Python 3.6 PostCommit
Run Python 3.7 PostCommit

Regards
yoshiki

2020年5月11日(月) 12:54 Hannah Jiang :
>
> It is done. Some more tests were triggered automatically when I commented to 
> the PR.
>
>
> On Sun, May 10, 2020 at 5:37 AM Yoshiki Obata  wrote:
>>
>> Hi Beam committers
>>
>> I would appreciate if you could trigger following 5 tests in
>> https://github.com/apache/beam/pull/11656
>>
>> Run Portable_Python PreCommit
>> Run Python PreCommit
>> Run Python 3.5 PostCommit
>> Run Python 3.6 PostCommit
>> Run Python 3.7 PostCommit
>>
>> Regards
>> yoshiki
>>
>> 2020年5月5日(火) 0:56 Robert Bradshaw :
>> >
>> > Done.
>> >
>> > On Mon, May 4, 2020 at 7:35 AM Rehman Murad Ali
>> >  wrote:
>> > >
>> > > Hi Beam committers,
>> > >
>> > > Would you please trigger the basic checks as well as validatesRunner 
>> > > check for this PR?
>> > > https://github.com/apache/beam/pull/11350
>> > >
>> > >
>> > > Thanks & Regards
>> > >
>> > > Rehman Murad Ali
>> > > Software Engineer
>> > > Mobile: +92 3452076766
>> > > Skype: rehman.muradali
>> > >
>> > >
>> > >
>> > > On Fri, May 1, 2020 at 5:11 PM Ismaël Mejía  wrote:
>> > >>
>> > >> done
>> > >>
>> > >> On Fri, May 1, 2020 at 5:31 AM Tomo Suzuki  wrote:
>> > >> >
>> > >> > Hi Beam committers,
>> > >> >
>> > >> > Would you trigger the precommit checks for this PR?
>> > >> > https://github.com/apache/beam/pull/11586
>> > >> >
>> > >> > Regards,
>> > >> > Tomo


Re: Beam 2.21 release update

2020-05-11 Thread Maximilian Michels
FYI I've created this issue and marked it as a blocker:
https://jira.apache.org/jira/browse/BEAM-9947

Essentially, the timer encoding is broken for all non-standard key
coders. The fix can be found here: https://github.com/apache/beam/pull/11658

-Max

On 08.05.20 18:53, Udi Meiri wrote:
> +Chad Dombrova  , who added _find_protoc_gen_mypy.
> 
> I'm guessing that the code
> in _install_grpcio_tools_and_generate_proto_files creates a kind of
> virtualenv, but it only works well for staging Python modules and not
> binaries like protoc-gen-mypy.
> (I assume there's a reason why it doesn't invoke virtualenv, probably
> since the list of things setup.py can expect to be installed is very
> minimal (setuptools).)
> 
> One solution would be to make these setup.py dependencies explicit in
> pyproject.toml, such that pip installs them before running
> setup.py: https://pip.pypa.io/en/stable/reference/pip/#pep-517-and-518-support
> It would help when using tools like pip ("pip wheel"), but I'm not sure
> what the alternative for "python setup.py sdist" is.
> 
> 
> On Thu, May 7, 2020 at 10:40 PM Thomas Weise  > wrote:
> 
> No additional stacktraces. Full error output below.
> 
> It's not clear what is going wrong.
> 
> There isn't any exception from the subprocess execution since the
> "WARNING:root:Installing grpcio-tools took 305.39 seconds." is printed.
> 
> Also, the time it takes to perform the install is equivalent to
> successfully running the pip command.
> 
> I will report back if I find anything else. Currently doing the
> explicit install via pip install -r sdks/python/build-requirements.txt
> 
> Thanks,
> Thomas
> 
> WARNING:root:Installing grpcio-tools took 269.27 seconds.
> INFO:gen_protos:Regenerating Python proto definitions (no output files).
> Process Process-1:
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in
> _bootstrap
>     self.run()
>   File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
>     self._target(*self._args, **self._kwargs)
>   File
> "/src/streamingplatform/beam-release/beam/sdks/python/gen_protos.py", line
> 378, in _install_grpcio_tools_and_generate_proto_files
>     generate_proto_files(force=force)
>   File
> "/src/streamingplatform/beam-release/beam/sdks/python/gen_protos.py", line
> 315, in generate_proto_files
>     protoc_gen_mypy = _find_protoc_gen_mypy()
>   File
> "/src/streamingplatform/beam-release/beam/sdks/python/gen_protos.py", line
> 233, in _find_protoc_gen_mypy
>     (fname, ', '.join(search_paths)))
> RuntimeError: Could not find protoc-gen-mypy in
> /code/venvs/venv2/bin, /code/venvs/venv2/bin, /code/venvs/venv3/bin,
> /usr/local/sbin, /usr/local/bin, /usr/sbin, /usr/bin, /sbin, /bin
> Traceback (most recent call last):
>   File "setup.py", line 311, in 
>     'mypy': generate_protos_first(mypy),
>   File
> 
> "/code/venvs/venv2/local/lib/python2.7/site-packages/setuptools/__init__.py",
> line 129, in setup
>     return distutils.core.setup(**attrs)
>   File "/usr/lib/python2.7/distutils/core.py", line 151, in setup
>     dist.run_commands()
>   File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands
>     self.run_command(cmd)
>   File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
>     cmd_obj.run()
>   File
> 
> "/code/venvs/venv2/local/lib/python2.7/site-packages/wheel/bdist_wheel.py",
> line 204, in run
>     self.run_command('build')
>   File "/usr/lib/python2.7/distutils/cmd.py", line 326, in run_command
>     self.distribution.run_command(command)
>   File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
>     cmd_obj.run()
>   File "/usr/lib/python2.7/distutils/command/build.py", line 128, in run
>     self.run_command(cmd_name)
>   File "/usr/lib/python2.7/distutils/cmd.py", line 326, in run_command
>     self.distribution.run_command(command)
>   File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
>     cmd_obj.run()
>   File "setup.py", line 235, in run
>     gen_protos.generate_proto_files()
>   File
> "/src/streamingplatform/beam-release/beam/sdks/python/gen_protos.py", line
> 310, in generate_proto_files
>     raise ValueError("Proto generation failed (see log for details).")
> ValueError: Proto generation failed (see log for details).
> 
> 
> On Thu, May 7, 2020 at 2:25 PM Udi Meiri  > wrote:
> 
> It's hard to say without more details what's going on. Ahmet
> you're right that it installs build-requirements.txt and retries
> calling generate_proto_files().
> 
> Thomas, were there additional stacktraces? (after a 

Beam Dependency Check Report (2020-05-11)

2020-05-11 Thread Apache Jenkins Server
ERROR: File 'src/build/dependencyUpdates/beam-dependency-check-report.html' does not exist

Unit-testing BEAM pipelines with PROCESSING_TIME timers

2020-05-11 Thread Robert.Butcher
I have a BEAM DoFn that I'm attempting to unit test.  It involves using a timer 
based on processing time and I've not managed to get it to fire.  The relevant 
code excerpts are as follows:

@TimerId("timer")
private final TimerSpec timer = TimerSpecs.timer(TimeDomain.PROCESSING_TIME);


@ProcessElement
public void process(@TimerId("timer") Timer timer) {
// Set a processing time timer to fire in 5 seconds so we can poll BigQuery
timer.offset(Duration.standardSeconds(5)).setRelative();
}


@OnTimer("timer")
public void onTimer() {
System.out.println("In onTimer");

When I use a TestPipeline with an appropriate PAssert, it always results in the 
following exception:

org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
java.util.NoSuchElementException

  at 
org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:348)
  at 
org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:318)
  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:213)
  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:67)
  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317)
  at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350)
  at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331)
  at 
com.nwm.foundry.atomic.AtomicCommitFnTest.shouldGenerateCorrectEvent(AtomicCommitFnTest.java:28)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
  at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
  at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
  at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
  at 
org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
  at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
  at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
  at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
  at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
  at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
  at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
  at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
  at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
  at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
  at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
  at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
  at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
  at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
  at 
com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
  at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
  at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
Caused by: java.util.NoSuchElementException
  at java.util.ArrayList$Itr.next(ArrayList.java:862)
  at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterators.getOnlyElement(Iterators.java:302)
  at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables.getOnlyElement(Iterables.java:254)
  at 
org.apache.beam.sdk.testing.PAssert$SingletonCheckerDoFn.processElement(PAssert.java:1417)

Swapping the timer for an EVENT_TIME timer works fine.

Is there a trick I'm missing here?

Kind regards,

Rob


Robert Butcher
Technical Architect | Foundry/SRS | NatWest Markets
WeWork, 10 Devonshire Square, London, EC2M 4AE
Mobile +44 (0) 7414 730866

This email is classified as CONFIDENTIAL unless otherwise stated.



This communication and any attachments are confidential and intended solely for 
the addressee. If you are not the intended recipient please advise us 
immediately and delete it. Unless specifically stated in the message or 
otherwise indicated, you may not duplicate, redistribute or forward this 
message and any attachments are not intended for distribution to, or use by any 
person or entity in any jurisdiction or country where such distribution or use 
would be contrary to local law or regulation. NatWest Markets Plc  or any 
affiliated entity ("NatWest Markets") accepts no 

Re: [DISCUSS] Dealing with @Ignored tests

2020-05-11 Thread Jan Lukavský
I think that we do have Jira issues for ignored test, there should be no 
problem with that. The questionable point is that when test gets 
Ignored, people might consider the problem as "less painful" and 
postpone the correct solution until ... forever. I'd just like to 
discuss if people see this as an issue. If yes, should we do something 
about that, or if no, maybe we can create a rule that test marked as 
Ignored for long time might be deleted, because apparently is only a 
dead code.


On 5/6/20 6:30 PM, Kenneth Knowles wrote:

Good point.

The raw numbers are available in the test run output. See 
https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PreCommit_Java_Cron/2718/testReport/ for 
the "skipped" column.
And you get the same on console or Gradle Scan: 
https://scans.gradle.com/s/ml3jv5xctkrmg/tests?collapse-all

This would be good to review periodically for obvious trouble spots.

But I think you mean something more detailed. Some report with 
columns: Test Suite, Test Method, Jira, Date Ignored, Most Recent Update


I think we can get most of this from Jira, if we just make sure that 
each ignored test has a Jira and they are all labeled in a consistent 
way. That would be the quickest way to get some result, even though it 
is not perfectly automated and audited.


Kenn

On Tue, May 5, 2020 at 2:41 PM Jan Lukavský > wrote:


Hi,

it seems we are accumulating test cases (see discussion in [1])
that are
marked as @Ignored (mostly due to flakiness), which is generally
undesirable. Associated JIRAs seem to be open for a long time, and
this
might generally cause that we loose code coverage. Would anyone have
idea on how to visualize these Ignored tests better? My first idea
would
be something similar to "Beam dependency check report", but that
seems
to be not the best example (which is completely different issue :)).

Jan

[1] https://github.com/apache/beam/pull/11614



Re: [REVIEW][please pause website changes] Migrated the Beam website to Hugo

2020-05-11 Thread Nam Bui
Hi,

@Ahmet: Yeah, it's all clear to me. :)
@Robert: Thanks for your ideas and also the script. It really helps me to
serve my works.

Best regard!

On Sat, May 9, 2020 at 2:10 AM Ahmet Altay  wrote:

> This sounds reasonable to me. Thank you. Nam, does it make sense to you?
>
> On Fri, May 8, 2020 at 11:53 AM Robert Bradshaw 
> wrote:
>
>> I'd really like to not see this work go to waste, both the original
>> revision, the further efforts Nam has done in making it more manageable to
>> review, and the work put into reviewing this so far, so we can get the
>> benefits of being on Hugo. How about this for a concrete proposal:
>>
>> (1) We get "standard" approval from one or more committers for the
>> infrastructure changes, just as with any other PR. Brian has
>> already started this, but if others could step up as well that'd be great.
>>
>> (2) Reviewers (and authors) typically count on (or request) sufficient
>> automated test coverage to augment the fact that their eyeballs are
>> fallible, which is something that is missing here (and given the size of
>> the change not easily compensated for by a more detailed manual review).
>> How about we use the script above (or similar) as an automated test to
>> validate the website's contents haven't (materially) changed. I feel we've
>> validated enough that the style looks good via spot checking (which is
>> something that should work on all pages if it works on one). The diff
>> between the current site and the newly generated site should be empty (it
>> might already be [1]), or at least we should get a stamp of approval on the
>> plain-text diff (which should be small), before merging.
>>
>> (3) To make things easier, everyone holds off on making any changes to
>> the old site until a fixed future date (say, next Wednesday). Hopefully we
>> can get it merged by then. If not, a condition for merging would be a
>> commitment incorporating new changes after this date.
>>
>> Does this sound reasonable?
>>
>> - Robert
>>
>>
>>
>> [1] I'd be curious as to how small the diff already is, but my script
>> relies on local directories with the generated HTML, which I don't have
>> handy at the moment.
>>
>>
>>
>> On Fri, May 8, 2020 at 10:45 AM Robert Bradshaw 
>> wrote:
>>
>>> Here's a script that we could run on the old and new sites that should
>>> quickly catch any major issues but not get caught up in formatting minutia.
>>>
>>>
>>>
>>> On Fri, May 8, 2020 at 10:23 AM Robert Bradshaw 
>>> wrote:
>>>
 On Fri, May 8, 2020 at 9:58 AM Aizhamal Nurmamat kyzy <
 aizha...@apache.org> wrote:

> I understand the difficulty, and this certainly comes with lessons
> learned for future similar projects.
>
> To your questions Robert:
> (1 and 2) I will commit to review the text in the resulting pages. I
> will try and use some automation to extract visible text from each page 
> and
> diff it with the current state of the website. I can do this starting next
> week. From some quick research, there seem to be tools that help with this
> analysis (
> https://stackoverflow.com/questions/3286955/compare-two-websites-and-see-if-they-are-equal
> )
>

 At first glance it looks like these tools would give diffs that are
 *larger* than the 47K one we're struggling to review here.


> By remaining in this state, we hold others up from making changes, or
> we increase the amount of work needed after merging to port over changes
> that may be missed. If we move forward, new changes can be done on top of
> the new website.
>

 I agree we don't want to hold others up from making changes. However,
 the amount of work to port changes over seems small in comparison to
 everything else that is being discussed here. (It also provides good
 incentives to reach the bar quickly and has the advantage of falling on the
 right people.) (3) will still take some time.

 If we go this route, we're lowering the bar for doc changes, but not
 removing it.


> (3) This makes sense. Brian, would you be able to spend some time to
> look at the automation changes (build files and scripts) to ensure they
> look fine?
>
> I would also like to write a post mortem to extract lessons learned
> and avoid this situation in the future.
>
>
> On Fri, May 8, 2020 at 9:44 AM Brian Hulette 
> wrote:
>
>> I'm -0 on merging as-is. I have the same concerns as Robert and he's
>> voiced them very well so I won't waste time re-airing them.
>>
>> (2) I spot checked the content, pulled out some common patterns, and
>>> it mostly looks good, but there were also some issues (e.g. several
>>> pages were replaced with the contents from entirely different pages).
>>> I would be more comfortable if, say, a smoke test of comparing the
>>> old
>>> and new sites, with html tags stripped and ignoring whitespace,