Re: New Contributor

2019-03-04 Thread Ismaël Mejía
Done, welcome!

On Tue, Mar 5, 2019 at 1:25 AM Boris Shkolnik  wrote:
>
>
> Hi,
>
> My name is Boris Shkolnik. I am a committer in Hadoop and Samza Apache 
> projects.
> I would like to contribute to beam.
> Could you please add me to the beam project.
>
> My user name is boryas @apache.org
>
> Thanks,
> -Boris.


Re: beam9 bad worker

2019-03-04 Thread Yifan Zou
FYI, the beam9 is off.

Re: When we use text to trigger jenkins check job, is there a way to
specify a server?
You cannot direct the job to a node via the github phrase. But, you could
simply specify the agent in the Jenkins DLS scripts by adding a parameter.
See the inventory jobs

as examples.

On Mon, Mar 4, 2019 at 3:06 PM Ruoyun Huang  wrote:

> Thanks a lot folks. Looking forward to the fix.
>
> When we use text to trigger jenkins check job, is there are way to specify
> a server?
>
> On Mon, Mar 4, 2019 at 2:06 PM Yifan Zou  wrote:
>
>> I am looking into the error and will disconnect the beam9 to stop
>> breaking tests.
>>
>> On Mon, Mar 4, 2019 at 2:00 PM Pablo Estrada  wrote:
>>
>>> I've talked with Yifan, and I believe he's looking into it. : )
>>> Best
>>> -P.
>>>
>>> On Mon, Mar 4, 2019 at 1:55 PM Ankur Goenka  wrote:
>>>
 Beam9 is failing all the scheduled jobs. Can we reboot the machine?

>>>
>
> --
> 
> Ruoyun  Huang
>
>


New Contributor

2019-03-04 Thread Boris Shkolnik
Hi,

My name is Boris Shkolnik. I am a committer in Hadoop and Samza Apache
projects.
I would like to contribute to beam.
Could you please add me to the beam project.

My user name is boryas @apache.org

Thanks,
-Boris.


Re: Website tests strangely broken

2019-03-04 Thread Thomas Weise
Perhaps exclude the JIRA links from link checking?


On Mon, Mar 4, 2019 at 9:01 AM Pablo Estrada  wrote:

> Fair enough. Thanks for your feedback. I'll try to look into that.
> Best
> -P.
>
> On Mon, Mar 4, 2019 at 2:30 AM Maximilian Michels  wrote:
>
>> JIRA is quite flaky and drops connections occasionally. Since we check
>> hundreds of links in a row, the failures are not surprising.
>>
>> Let's check if we can introduce more retries. I'll try to talk with
>> INFRA to see if they can improve JIRA performance.
>>
>> Thanks,
>> Max
>>
>> On 02.03.19 00:54, Ruoyun Huang wrote:
>> > The log says running on 880 external links, but only showing 100ish
>> (and
>> > the number varies across different run) failure messages.  Likely it is
>> > just flakey due to connection not stable on JIRA's site?
>> >
>> > Not sure how our tests are organized, but maybe re-try http request
>> helps?
>> >
>> > On Fri, Mar 1, 2019 at 12:46 PM Pablo Estrada > > > wrote:
>> >
>> > Hello all,
>> > the website tests are broken. I've filed BEAM-6760
>> >  to track fixing
>> > them, but I wanted to see if anyone has any idea about why it may be
>> > failing.
>> >
>> > It's been broken for a few days:
>> > https://builds.apache.org/job/beam_PreCommit_Website_Cron/
>> >
>> > And looking at the failures, it seems that they represent broken
>> links:
>> >
>> https://builds.apache.org/job/beam_PreCommit_Website_Cron/725/console
>> >
>> > But looking at each of the links opens their website without
>> problems.
>> >
>> > It may be some environmental temporary issue, but why would it fail
>> > consistently for the last few days then?
>> > Thoughts?
>> > Thanks
>> > -P.
>> >
>> >
>> >
>> > --
>> > 
>> > Ruoyun  Huang
>> >
>>
>


Re: beam9 bad worker

2019-03-04 Thread Ruoyun Huang
Thanks a lot folks. Looking forward to the fix.

When we use text to trigger jenkins check job, is there are way to specify
a server?

On Mon, Mar 4, 2019 at 2:06 PM Yifan Zou  wrote:

> I am looking into the error and will disconnect the beam9 to stop breaking
> tests.
>
> On Mon, Mar 4, 2019 at 2:00 PM Pablo Estrada  wrote:
>
>> I've talked with Yifan, and I believe he's looking into it. : )
>> Best
>> -P.
>>
>> On Mon, Mar 4, 2019 at 1:55 PM Ankur Goenka  wrote:
>>
>>> Beam9 is failing all the scheduled jobs. Can we reboot the machine?
>>>
>>

-- 

Ruoyun  Huang


Re: [ANNOUNCE] New committer announcement: Michael Luckey

2019-03-04 Thread Mark Liu
Congrats Michael!

On Mon, Mar 4, 2019 at 10:03 AM Joana Filipa Bernardo Carrasqueira <
joanafil...@google.com> wrote:

> Congratulations Michael!! Welcome!
>
> On Thu, Feb 28, 2019 at 3:55 PM Daniel Oliveira 
> wrote:
>
>> Congrats Michael!
>>
>> On Thu, Feb 28, 2019 at 3:12 AM Maximilian Michels 
>> wrote:
>>
>>> Welcome, it's great to have you onboard Michael!
>>>
>>> On 28.02.19 11:46, Michael Luckey wrote:
>>> > Thanks to all of you for the warm welcome. Really happy to be part of
>>> > this great community!
>>> >
>>> > michel
>>> >
>>> > On Thu, Feb 28, 2019 at 8:39 AM David Morávek >> > > wrote:
>>> >
>>> > Congrats Michael! 
>>> >
>>> > D.
>>> >
>>> >  > On 28 Feb 2019, at 03:27, Ismaël Mejía >> > > wrote:
>>> >  >
>>> >  > Congratulations Michael, and thanks for all the contributions!
>>> >  >
>>> >  >> On Wed, Feb 27, 2019 at 6:30 PM Ankur Goenka <
>>> goe...@google.com
>>> > > wrote:
>>> >  >>
>>> >  >> Congratulations Michael!
>>> >  >>
>>> >  >>> On Wed, Feb 27, 2019 at 2:25 PM Thomas Weise
>>> > mailto:thomas.we...@gmail.com>> wrote:
>>> >  >>>
>>> >  >>> Congrats Michael!
>>> >  >>>
>>> >  >>>
>>> >   On Wed, Feb 27, 2019 at 12:41 PM Gleb Kanterov
>>> > mailto:g...@spotify.com>> wrote:
>>> >  
>>> >   Congratulations and welcome!
>>> >  
>>> >  > On Wed, Feb 27, 2019 at 8:57 PM Connell O'Callaghan
>>> > mailto:conne...@google.com>> wrote:
>>> >  >
>>> >  > Excellent thank you for sharing Kenn!!!
>>> >  >
>>> >  > Michael congratulations for this recognition of your
>>> > contributions to advancing BEAM
>>> >  >
>>> >  >> On Wed, Feb 27, 2019 at 11:52 AM Kenneth Knowles
>>> > mailto:k...@apache.org>> wrote:
>>> >  >>
>>> >  >> Hi all,
>>> >  >>
>>> >  >> Please join me and the rest of the Beam PMC in welcoming a
>>> > new committer: Michael Luckey
>>> >  >>
>>> >  >> Michael has been contributing to Beam since early 2017. He
>>> > has fixed many build and developer environment issues, noted and
>>> > root-caused breakages on master, generously reviewed many others'
>>> > changes to the build. In consideration of Michael's contributions,
>>> > the Beam PMC trusts Michael with the responsibilities of a Beam
>>> > committer [1].
>>> >  >>
>>> >  >> Thank you, Michael, for your contributions.
>>> >  >>
>>> >  >> Kenn
>>> >  >>
>>> >  >> [1]
>>> >
>>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>> >  
>>> >  
>>> >  
>>> >   --
>>> >   Cheers,
>>> >   Gleb
>>> >
>>>
>>
>
> --
>
> *Joana Carrasqueira*
>
> Cloud Developer Relations Events Manager
>
> 415-602-2507 Mobile
>
> 1160 N Mathilda Ave, Sunnyvale, CA 94089
>
>
>


Re: [ANNOUNCE] New committer announcement: Michael Luckey

2019-03-04 Thread Matthias Baetens
Welcome, Michael! :)

On Mon, 4 Mar 2019 at 18:03, Joana Filipa Bernardo Carrasqueira <
joanafil...@google.com> wrote:

> Congratulations Michael!! Welcome!
>
> On Thu, Feb 28, 2019 at 3:55 PM Daniel Oliveira 
> wrote:
>
>> Congrats Michael!
>>
>> On Thu, Feb 28, 2019 at 3:12 AM Maximilian Michels 
>> wrote:
>>
>>> Welcome, it's great to have you onboard Michael!
>>>
>>> On 28.02.19 11:46, Michael Luckey wrote:
>>> > Thanks to all of you for the warm welcome. Really happy to be part of
>>> > this great community!
>>> >
>>> > michel
>>> >
>>> > On Thu, Feb 28, 2019 at 8:39 AM David Morávek >> > > wrote:
>>> >
>>> > Congrats Michael! 
>>> >
>>> > D.
>>> >
>>> >  > On 28 Feb 2019, at 03:27, Ismaël Mejía >> > > wrote:
>>> >  >
>>> >  > Congratulations Michael, and thanks for all the contributions!
>>> >  >
>>> >  >> On Wed, Feb 27, 2019 at 6:30 PM Ankur Goenka <
>>> goe...@google.com
>>> > > wrote:
>>> >  >>
>>> >  >> Congratulations Michael!
>>> >  >>
>>> >  >>> On Wed, Feb 27, 2019 at 2:25 PM Thomas Weise
>>> > mailto:thomas.we...@gmail.com>> wrote:
>>> >  >>>
>>> >  >>> Congrats Michael!
>>> >  >>>
>>> >  >>>
>>> >   On Wed, Feb 27, 2019 at 12:41 PM Gleb Kanterov
>>> > mailto:g...@spotify.com>> wrote:
>>> >  
>>> >   Congratulations and welcome!
>>> >  
>>> >  > On Wed, Feb 27, 2019 at 8:57 PM Connell O'Callaghan
>>> > mailto:conne...@google.com>> wrote:
>>> >  >
>>> >  > Excellent thank you for sharing Kenn!!!
>>> >  >
>>> >  > Michael congratulations for this recognition of your
>>> > contributions to advancing BEAM
>>> >  >
>>> >  >> On Wed, Feb 27, 2019 at 11:52 AM Kenneth Knowles
>>> > mailto:k...@apache.org>> wrote:
>>> >  >>
>>> >  >> Hi all,
>>> >  >>
>>> >  >> Please join me and the rest of the Beam PMC in welcoming a
>>> > new committer: Michael Luckey
>>> >  >>
>>> >  >> Michael has been contributing to Beam since early 2017. He
>>> > has fixed many build and developer environment issues, noted and
>>> > root-caused breakages on master, generously reviewed many others'
>>> > changes to the build. In consideration of Michael's contributions,
>>> > the Beam PMC trusts Michael with the responsibilities of a Beam
>>> > committer [1].
>>> >  >>
>>> >  >> Thank you, Michael, for your contributions.
>>> >  >>
>>> >  >> Kenn
>>> >  >>
>>> >  >> [1]
>>> >
>>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>> >  
>>> >  
>>> >  
>>> >   --
>>> >   Cheers,
>>> >   Gleb
>>> >
>>>
>>
>
> --
>
> *Joana Carrasqueira*
>
> Cloud Developer Relations Events Manager
>
> 415-602-2507 Mobile
>
> 1160 N Mathilda Ave, Sunnyvale, CA 94089
>
>
>


Re: beam9 bad worker

2019-03-04 Thread Pablo Estrada
I've talked with Yifan, and I believe he's looking into it. : )
Best
-P.

On Mon, Mar 4, 2019 at 1:55 PM Ankur Goenka  wrote:

> Beam9 is failing all the scheduled jobs. Can we reboot the machine?
>


Re: beam9 bad worker

2019-03-04 Thread Yifan Zou
I am looking into the error and will disconnect the beam9 to stop breaking
tests.

On Mon, Mar 4, 2019 at 2:00 PM Pablo Estrada  wrote:

> I've talked with Yifan, and I believe he's looking into it. : )
> Best
> -P.
>
> On Mon, Mar 4, 2019 at 1:55 PM Ankur Goenka  wrote:
>
>> Beam9 is failing all the scheduled jobs. Can we reboot the machine?
>>
>


beam9 bad worker

2019-03-04 Thread Ankur Goenka
Beam9 is failing all the scheduled jobs. Can we reboot the machine?


Re: Beam Summit Europe 2019: CfP

2019-03-04 Thread Pablo Estrada
Thanks to everyone involved organizing this. This is exciting : )
Best
-P.

On Mon, Mar 4, 2019 at 1:27 PM Matthias Baetens 
wrote:

> Hi everyone,
>
> As you might already know, the *Beam Summit Europe 2019* will take place
> in *Berlin* this year on *19-20 June*!
>
> Of course, we would love to have you there. That is why we are opening the 
> *Call
> for Speakers .*
>
> We are looking for people to share use-cases for Beam, do a technical deep
> dive or deliver a workshop on Beam. We have a few standard slots, but don't
> hesitate to do a proposal in case you have something different in mind.
>
> Stay tuned for more announcements - we are working hard on the website,
> which should be up and running soon. And of course, don't hesitate to reach
> out if you have any questions or suggestions!
>
> Looking forward to seeing you in person later this year.
>
> Best regards,
> The Beam Summit organising team
>


Beam Summit Europe 2019: CfP

2019-03-04 Thread Matthias Baetens
Hi everyone,

As you might already know, the *Beam Summit Europe 2019* will take place in
*Berlin* this year on *19-20 June*!

Of course, we would love to have you there. That is why we are opening
the *Call
for Speakers .*

We are looking for people to share use-cases for Beam, do a technical deep
dive or deliver a workshop on Beam. We have a few standard slots, but don't
hesitate to do a proposal in case you have something different in mind.

Stay tuned for more announcements - we are working hard on the website,
which should be up and running soon. And of course, don't hesitate to reach
out if you have any questions or suggestions!

Looking forward to seeing you in person later this year.

Best regards,
The Beam Summit organising team


Re: [VOTE] Release 2.11.0, release candidate #2

2019-03-04 Thread Ahmet Altay
Thank you for the additional votes and validations.

Update: Binaries are pushed. Website updates are blocked on an issue that
is preventing beam-site changes to be synced the beam website.
(INFRA-17953). I am waiting for that to be resolved before sending an
announcement.

On Mon, Mar 4, 2019 at 3:00 AM Robert Bradshaw  wrote:

> I see the vote has passed, but +1 (binding) from me as well.
>
> On Mon, Mar 4, 2019 at 11:51 AM Jean-Baptiste Onofré 
> wrote:
> >
> > +1 (binding)
> >
> > Tested with beam-samples.
> >
> > Regards
> > JB
> >
> > On 26/02/2019 10:40, Ahmet Altay wrote:
> > > Hi everyone,
> > >
> > > Please review and vote on the release candidate #2 for the version
> > > 2.11.0, as follows:
> > >
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > > The complete staging area is available for your review, which includes:
> > > * JIRA release notes [1],
> > > * the official Apache source release to be deployed to dist.apache.org
> > >  [2], which is signed with the key with
> > > fingerprint 64B84A5AD91F9C20F5E9D9A7D62E71416096FA00 [3],
> > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > * source code tag "v2.11.0-RC2" [5],
> > > * website pull request listing the release [6] and publishing the API
> > > reference manual [7].
> > > * Python artifacts are deployed along with the source release to the
> > > dist.apache.org  [2].
> > > * Validation sheet with a tab for 2.11.0 release to help with
> validation
> > > [8].
> > >
> > > The vote will be open for at least 72 hours. It is adopted by majority
> > > approval, with at least 3 PMC affirmative votes.
> > >
> > > Thanks,
> > > Ahmet
> > >
> > > [1]
> > >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344775
> > > [2] https://dist.apache.org/repos/dist/dev/beam/2.11.0/
> > > [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> > > [4]
> https://repository.apache.org/content/repositories/orgapachebeam-1064/
> > > [5] https://github.com/apache/beam/tree/v2.11.0-RC2
> > > [6] https://github.com/apache/beam/pull/7924
> > > [7] https://github.com/apache/beam-site/pull/587
> > > [8]
> > >
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=542393513
> > >
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
>


Re: Beam Summits!

2019-03-04 Thread Joana Filipa Bernardo Carrasqueira
If you're attending Berlin Buzzwords, the Beam Summit will take place right
after this conference on 19&20 June. Save the date!

On Fri, Mar 1, 2019 at 4:09 PM Thomas Weise  wrote:

> Update: organizers are looking for new dates for the Summit in SF,
> currently trending towards October.
>
> For Beam Summit Europe see:
> https://twitter.com/matthiasbaetens/status/1098854758893273088
>
> On Wed, Jan 23, 2019 at 9:09 PM Austin Bennett <
> whatwouldausti...@gmail.com> wrote:
>
>> Hi All,
>>
>> PMC approval still pending for Summit in SF (so things may change), but
>> wanted to get a preliminary CfP out there to start to get sense of interest
>> -- giving the targeted dates are approaching.  Much of this
>> delay/uncertainty my fault and I should have done more before the holidays
>> and my long vacation in from end of December through mid-January.  This CfP
>> will remain open for some time, and upon/after approval will make sure to
>> give notice for a CfP deadline.
>>
>> Please submit talks via:
>>
>> https://docs.google.com/forms/d/e/1FAIpQLSfD0qhoS2QrDbtK1E85gATGQCgRGKhQcLIkiiAsPW9G_7Um_Q/viewform?usp=sf_link
>>
>> Would very much encourage anyone that can lead
>> hands-on/tutorials/workshops for full day, half-day, focused couple hours,
>> etc to apply, as well as any technical talks and/or use cases.  Again,
>> tentative dates(s) 3 and 4 April 2019.
>>
>> Thanks,
>> Austin
>>
>>
>> On Mon, Jan 21, 2019 at 7:58 PM Austin Bennett <
>> whatwouldausti...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> Other projects/Summits like Kafka and Spark offer add-on days to summits
>>> for training.  I'm wondering the appetite/interest for hands-on sessions
>>> for working with Beam, and whether we think that'd be helpful.  Are there
>>> people that would benefit from a beginning with Beam day, or a more
>>> advanced/specialized session.  This was on the original agenda for London,
>>> but hadn't materialized, seeing if we think there is interest to make this
>>> worth putting together/making-available.
>>>
>>> Furthermore, it had been mentioned that an introduction to contributing
>>> to Beam might also be beneficial.  Also curious to hear whether that would
>>> be of interest to people here (or for those that those here know, but
>>> aren't following these distribution channels for themselves -- since
>>> following dev@ or even user@ is potentially a more focused selection of
>>> those with an interest in Beam.
>>>
>>> Thanks,
>>> Austin
>>>
>>>
>>>
>>> On Wed, Dec 19, 2018 at 3:05 PM Austin Bennett <
>>> whatwouldausti...@gmail.com> wrote:
>>>
 Hi All,

 I really enjoyed Beam Summit in London (Thanks Matthias!), and there
 was much enthusiasm for continuations.  We had selected that location in a
 large part due to the growing community there, and we have users in a
 variety of locations.  In our 2019 calendar,
 https://docs.google.com/spreadsheets/d/1CloF63FOKSPM6YIuu8eExjhX6xrIiOp5j4zPbSg3Apo/
 shared in the past weeks, 3 Summits are tentatively slotted for this year.
 Wanting to start running this by the group to get input.

 * Beam Summit NA, in San Francisco, approx 3 April 2019 (following
 Flink Forward).  I can organize.
 * Beam Summit Europe, in Stockholm, this was the runner up in voting
 falling behind London.  Or perhaps Berlin?  October-ish 2019
 * Beam Summit Asia, in Tokyo ??

 What are general thoughts on locations/dates?

 Looking forward to convening in person soon.

 Cheers,
 Austin

>>>

-- 

*Joana Carrasqueira*

Cloud Developer Relations Events Manager

415-602-2507 Mobile

1160 N Mathilda Ave, Sunnyvale, CA 94089


Re: [ANNOUNCE] New committer announcement: Michael Luckey

2019-03-04 Thread Joana Filipa Bernardo Carrasqueira
Congratulations Michael!! Welcome!

On Thu, Feb 28, 2019 at 3:55 PM Daniel Oliveira 
wrote:

> Congrats Michael!
>
> On Thu, Feb 28, 2019 at 3:12 AM Maximilian Michels  wrote:
>
>> Welcome, it's great to have you onboard Michael!
>>
>> On 28.02.19 11:46, Michael Luckey wrote:
>> > Thanks to all of you for the warm welcome. Really happy to be part of
>> > this great community!
>> >
>> > michel
>> >
>> > On Thu, Feb 28, 2019 at 8:39 AM David Morávek > > > wrote:
>> >
>> > Congrats Michael! 
>> >
>> > D.
>> >
>> >  > On 28 Feb 2019, at 03:27, Ismaël Mejía > > > wrote:
>> >  >
>> >  > Congratulations Michael, and thanks for all the contributions!
>> >  >
>> >  >> On Wed, Feb 27, 2019 at 6:30 PM Ankur Goenka > > > wrote:
>> >  >>
>> >  >> Congratulations Michael!
>> >  >>
>> >  >>> On Wed, Feb 27, 2019 at 2:25 PM Thomas Weise
>> > mailto:thomas.we...@gmail.com>> wrote:
>> >  >>>
>> >  >>> Congrats Michael!
>> >  >>>
>> >  >>>
>> >   On Wed, Feb 27, 2019 at 12:41 PM Gleb Kanterov
>> > mailto:g...@spotify.com>> wrote:
>> >  
>> >   Congratulations and welcome!
>> >  
>> >  > On Wed, Feb 27, 2019 at 8:57 PM Connell O'Callaghan
>> > mailto:conne...@google.com>> wrote:
>> >  >
>> >  > Excellent thank you for sharing Kenn!!!
>> >  >
>> >  > Michael congratulations for this recognition of your
>> > contributions to advancing BEAM
>> >  >
>> >  >> On Wed, Feb 27, 2019 at 11:52 AM Kenneth Knowles
>> > mailto:k...@apache.org>> wrote:
>> >  >>
>> >  >> Hi all,
>> >  >>
>> >  >> Please join me and the rest of the Beam PMC in welcoming a
>> > new committer: Michael Luckey
>> >  >>
>> >  >> Michael has been contributing to Beam since early 2017. He
>> > has fixed many build and developer environment issues, noted and
>> > root-caused breakages on master, generously reviewed many others'
>> > changes to the build. In consideration of Michael's contributions,
>> > the Beam PMC trusts Michael with the responsibilities of a Beam
>> > committer [1].
>> >  >>
>> >  >> Thank you, Michael, for your contributions.
>> >  >>
>> >  >> Kenn
>> >  >>
>> >  >> [1]
>> >
>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>> >  
>> >  
>> >  
>> >   --
>> >   Cheers,
>> >   Gleb
>> >
>>
>

-- 

*Joana Carrasqueira*

Cloud Developer Relations Events Manager

415-602-2507 Mobile

1160 N Mathilda Ave, Sunnyvale, CA 94089


Re: Apache Beam Newsletter - February/March 2019

2019-03-04 Thread Suneel Marthi
Is this the final draft? - we had 2 beam talks at Big Data Tech Warsaw last
Wednesday - I can send the updates offline.

On Mon, Mar 4, 2019 at 6:16 PM Rose Nguyen  wrote:

>
> [image: Beam.png]
>
> February-March 2019 | Newsletter
>
> What’s been done
>
> --
>
> Apache Beam 2.10.0 released (by: many contributors)
>
>-
>
>Download the release here.
>
>-
>
>See the blog post
> for more
>details.
>
>
> Apache Beam awarded the 2019 Technology of the Year Award!
>
>-
>
>InfoWorld just awarded Beam the 2019 Technology of the Year Award.
>-
>
>See this  article
>
> 
>for more details.
>
>
> Kettle Beam 0.5 released with support for flink (by: Matt Casters)
>
>-
>
>Kettle now supports Apache Flink as well as Cloud Dataflow and Spark.
>-
>
>See Matt’s Blog
>
> 
>for more details.
>
>
>
> What we’re working on...
>
> --
>
> Apache Beam 2.11.0 release (by: many contributors)
>
>
> Hive Metastore Table provider for SQL (by: Anton Kedin)
>
>-
>
>Support for plugging table providers through Beam SQL API to allow
>obtaining table schemas from external sources.
>-
>
>See the PR  for more details.
>
>
> User Defined Coders for the Beam Go SDK (by: Robert Burke)
>
>-
>
>Working on expanding the variety of user defined types that can be a
>member of a PCollection in the Go SDK.
>-
>
>See BEAM-3306  for
>more details.
>
>
> Python 3 (by: Ahmet Altay, Robert Bradshaw, Charles Chen, Mark Liu, Robbe
> Sneyders, Juta Staes, Valentyn Tymofieiev)
>
>-
>
>Beam 2.11.0 is the first release offering partial Python 3 support.
>-
>
>Many thanks to all contributors who helped to reach this milestone.
>-
>
>IO availablility on Python 3 is currently limited and only Python 3.5
>version has been tested extensively.
>-
>
>Stay tuned on BEAM-1251 for more details.
>
>
> Notebooks for quickstarts and custom I/O (by: David Cavazos)
>
>-
>
>Adding IPython notebooks and snippets
>-
>
>See [BEAM-6557]  for more
>details.
>
>
>
>
>  New members
> --
>
> New PMC member!
>
>-
>
>Etienne Chauchot, Nantes, France
>
>
> New Committers!
>
>-
>
>Gleb Kanterov, Stockholm, Sweden
>-
>
>Michael Luckey
>
>
> New Contributors!
>
>-
>
>Kyle Weaver, San Francisco, CA
>-
>
>   Would like to help begin implementing portability support for the
>   Spark runner
>   -
>
>Tanay Tummapalli, Delhi, India
>-
>
>   Would like to contribute to Open Source this summer as part of
>   Google Summer of Code
>   -
>
>Brian Hulette, Seattle, WA
>-
>
>   Contributing to Beam Portability
>   -
>
>Michał Walenia, Warsaw, Poland
>-
>
>   Working on integration and load testing
>   -
>
>Daniel Chen, San Francisco, CA
>-
>
>   Working on Beam Samza runner
>
>
>
>  Talks & meetups
> --
>
>
> Plugin Machine Intelligence and Apache Beam with Pentaho - Feb 7 @ London
>
>-
>
>Watch the How to Run Kettle on Apache Beam video here
>
> .
>
>-
>
>See event details here
>..
>
>
> Beam @Lyft / Streaming, TensorFlow and use-cases - Feb 7 @ San Francisco,
> CA
>
>-
>
>Organized by Thomas Weise and Austin Bennet, with speakers Tyler
>Akidau, Robert Crowe, Thomas Weise and Amar Pai
>-
>
>See event details here
>
>and the slides for these presentation: Overview of Apache Beam and
>TensorFlow Transform (TFX) with Apache Beam
>, Python Streaming Pipelines
>with Beam on Flink , 
> Dynamic
>pricing of Lyft rides using streaming
>
> 
>
> .
>
> Flink meetup - Feb 21@ Seattle, WA
>
>-
>
>Speakers from Alibaba, Google, and Uber gave talks about Apache Flink
>with Hive, Tensorflow, Beam, and AthenaX.
>-
>
>See event details here
> and
>presentations here .
>
>
>
> Beam 

Re: KafkaIO Exactly-Once & Flink Runner

2019-03-04 Thread Kenneth Knowles
On Mon, Mar 4, 2019 at 9:18 AM Reuven Lax  wrote:

>
>
> On Mon, Mar 4, 2019 at 9:04 AM Kenneth Knowles  wrote:
>
>>
>>
>> On Mon, Mar 4, 2019 at 7:16 AM Maximilian Michels  wrote:
>>
>>> > If you randomly generate shard ids, buffer those until finalization,
>>> finalize a checkpoint so that you never need to re-run that generation,
>>> isn't the result stable from that point onwards?
>>>
>>> Yes, you're right :) For @RequiresStableInput we will always have to
>>> buffer and emit only after a finalized checkpoint.
>>>
>>> 2PC is the better model for Flink, at least in the case of Kafka because
>>> it can offload the buffering to Kafka via its transactions.
>>> RequiresStableInput is a more general solution and it is feasible to
>>> support it in the Flink Runner. However, we have to make sure that
>>> checkpoints are taken frequently to avoid too much memory pressure.
>>
>>
>>> It would be nice to also support 2PC in Beam, i.e. the Runner could
>>> choose to either buffer/materialize input or do a 2PC, but it would also
>>> break the purity of the existing model.
>>>
>>
>> Still digging in to details. I think the "generate random shard ids &
>> buffer" is a tradition but more specific to BigQueryIO or FileIO styles. It
>> doesn't have to be done that way if the target system has special support
>> like Kafka does.
>>
>> For Kafka, can you get the 2PC behavior like this: Upstream step: open a
>> transaction, write a bunch of stuff to it (let Kafka do the buffering) and
>> emit a transaction identifier. Downstream @RequiresStableInput step: close
>> transaction. Again, I may be totally missing something, but I think that
>> this has identical characteristics:
>>
>
> Does Kafka garbage collect this eventually in the case where you crash and
> start again  with a different transaction identifier?
>

I believe that is what I read on the page about Flink's Kafka 2PC, though I
cannot find it any more. What would the alternative be for Kafka? You
always have to be ready for a client that goes away.

Kenn


>
>>  - Kafka does the buffering
>>  - checkpoint finalization is the driver of latency
>>  - failure before checkpoint finalization means the old transaction sits
>> around and times out eventually
>>  - failure after checkpoint finalization causes retry with the same
>> transaction identifier
>>
>> Kenn
>>
>>
>>>
>>> On 01.03.19 19:42, Kenneth Knowles wrote:
>>> > I think I am fundamentally misunderstanding checkpointing in Flink.
>>> >
>>> > If you randomly generate shard ids, buffer those until finalization,
>>> > finalize a checkpoint so that you never need to re-run that
>>> generation,
>>> > isn't the result stable from that point onwards?
>>> >
>>> > Kenn
>>> >
>>> > On Fri, Mar 1, 2019 at 10:30 AM Maximilian Michels >> > > wrote:
>>> >
>>> > Fully agree. I think we can improve the situation drastically. For
>>> > KafkaIO EOS with Flink we need to make these two changes:
>>> >
>>> > 1) Introduce buffering while the checkpoint is being taken
>>> > 2) Replace the random shard id assignment with something
>>> deterministic
>>> >
>>> > However, we won't be able to provide full compatibility with
>>> > RequiresStableInput because Flink only guarantees stable input
>>> after a
>>> > checkpoint. RequiresStableInput requires input at any point in
>>> time to
>>> > be stable. IMHO the only way to achieve that is materializing
>>> output
>>> > which Flink does not currently support.
>>> >
>>> > KafkaIO does not need all the power of RequiresStableInput to
>>> achieve
>>> > EOS with Flink, but for the general case I don't see a good
>>> solution at
>>> > the moment.
>>> >
>>> > -Max
>>> >
>>> > On 01.03.19 16:45, Reuven Lax wrote:
>>> >  > Yeah, the person who was working on it originally stopped
>>> working on
>>> >  > Beam, and nobody else ever finished it. I think it is important
>>> to
>>> >  > finish though. Many of the existing Sinks are only fully
>>> correct for
>>> >  > Dataflow today, because they generate either Reshuffle or
>>> > GroupByKey to
>>> >  > ensure input stability before outputting (in many cases this
>>> code
>>> > was
>>> >  > inherited from before Beam existed). On Flink today, these sinks
>>> > might
>>> >  > occasionally produce duplicate output in the case of failures.
>>> >  >
>>> >  > Reuven
>>> >  >
>>> >  > On Fri, Mar 1, 2019 at 7:18 AM Maximilian Michels <
>>> m...@apache.org
>>> > 
>>> >  > >> wrote:
>>> >  >
>>> >  > Circling back to the RequiresStableInput annotation[1]. I've
>>> > done some
>>> >  > protoyping to see how this could be integrated into Flink.
>>> I'm
>>> >  > currently
>>> >  > writing a test based on RequiresStableInput.
>>> >  >
>>> >  > I found out there are already checks in place at the
>>> 

Re: [DISCUSS] (Forked thread) Beam issue triage & assignees

2019-03-04 Thread Kenneth Knowles
This effort to improve our triage is still ongoing. To recall:

Issues are no longer automatically assigned, so we have to watch them!

Here's a saved search for issues needing triage:
https://issues.apache.org/jira/issues/?filter=12345682

Anyone can help out. Just make sure the issue is in a suitable component
and someone is assigned or mentioned so they'll get a notification, then
add the "triaged" tag.

You can also subscribe to the filter to watch incoming issues.

Kenn

On Wed, Feb 6, 2019 at 9:04 PM Kenneth Knowles  wrote:

> I re-triaged most issues where the creation date != last update. I worked
> through everyone with more issues than myself (which I have triaged
> regularly) and a few people with a few fewer issues.
>
> I didn't look as closely at issues that were filed by the assignee. So if
> you filed a bunch of issues that landed on yourself, take a look.
>
> If you have fewer than 30 issues assigned to you, please take a look at
> them now.
>
> Kenn
>
> On Wed, Feb 6, 2019 at 8:15 PM Kenneth Knowles  wrote:
>
>> While we work with infra on this, let's remove the broken system and use
>> tags. It is important that issues coming in are known to be untriaged, so
>> instead of a "Needs Triage" label, we should use "triaged". So I will take
>> these actions that everyone seems to agree on:
>>
>>  - Remove default assignment from Jira configs
>>  - Unassign all issues from people with a huge number
>>  - Add "triaged" tag to issues that are assigned and have some meaningful
>> recent activity
>>
>> I will use trial-and-error to figure out what looks OK for "huge number"
>> and "meaningful recent activity".
>>
>> Kenn
>>
>> On Fri, Jan 11, 2019 at 3:20 PM Kenneth Knowles  wrote:
>>
>>> Filed https://issues.apache.org/jira/browse/INFRA-17628 for the new
>>> status. The rest of 1-3 is self-service I think. I expect step 4 and 5 will
>>> need INFRA as well, but I/we should do what we can to make a very clear
>>> request.
>>>
>>> On Fri, Jan 11, 2019 at 12:54 PM Kenneth Knowles  wrote:
>>>
 It sounds like there's a lot of consensus, pretty much on the action
 items that Max and Ahmet suggested. I will start on these first steps if no
 one objects:

 0) Add a Needs Review status to our workflow
 1) Change new issues to be Unassigned and to be in status "Needs Review"
 2) Unassign all issues from folks with > 30

 And I'm not sure if folks had more to say on these:

 3) Use Wiki of multiple committers per component rather than Jira
 component owners
 4) Automatically unassign stale issues that are just sitting on an
 assignee
 5) Look into SLOs per issue priority and see how we can surface SLO
 violations (reports and pings)

 Kenn

 On Thu, Jan 10, 2019 at 11:41 AM Scott Wegner 
 wrote:

> +1
>
> > 3) Ensure that each component's unresolved issues get looked at
> regularly
>
> This is ideal, but I also don't know how to get to this state.
> Starting with clear component ownership and expectations will help. If the
> triaging process is well-defined, then members of the community can help
> for any components which need additional support.
>
> On Thu, Jan 10, 2019 at 12:21 AM Mikhail Gryzykhin <
> gryzykhin.mikh...@gmail.com> wrote:
>
>> +1 to keep issues unassigned and reevaluate backlog from time to time.
>>
>> We can also auto-unassign if there was no activity on ticket for N
>> days. Or we can have auto-mailed report that highlights stale assigned
>> issues.
>>
>> On Thu, Jan 10, 2019 at 12:10 AM Robert Bradshaw 
>> wrote:
>>
>>> On Thu, Jan 10, 2019 at 3:20 AM Ahmet Altay 
>>> wrote:
>>> >
>>> > I agree with the proposals here. Initial state of "Needs Review"
>>> and blocking releases on untriaged issues will ensure that we will at 
>>> least
>>> look at every new issue once.
>>>
>>> +1.
>>>
>>> I'm more ambivalent about closing stale issues. Unlike PRs, issues
>>> can
>>> be filed as "we should (not forget to) do this" much sooner than
>>> they're actively worked on.
>>>
>>> > On Wed, Jan 9, 2019 at 10:30 AM Maximilian Michels 
>>> wrote:
>>> >>
>>> >> Hi Kenn,
>>> >>
>>> >> As your data shows, default-assigning issues to a single person
>>> does not
>>> >> automatically solve triaging issues. Quite the contrary, it hides
>>> the triage
>>> >> status of an issue.
>>> >>
>>> >>  From the perspective of the Flink Runner, we used to auto-assign
>>> but we got rid
>>> >> of this. Instead, we monitor the newly coming issues and take
>>> actions. We also
>>> >> go through the old ones occasionally. I believe that works fine
>>> for us.
>>> >>
>>> >> The Flink project itself also does not default-assign, newly
>>> created issues are
>>> >> unassigned. There are component leads 

Re: KafkaIO Exactly-Once & Flink Runner

2019-03-04 Thread Reuven Lax
On Mon, Mar 4, 2019 at 9:04 AM Kenneth Knowles  wrote:

>
>
> On Mon, Mar 4, 2019 at 7:16 AM Maximilian Michels  wrote:
>
>> > If you randomly generate shard ids, buffer those until finalization,
>> finalize a checkpoint so that you never need to re-run that generation,
>> isn't the result stable from that point onwards?
>>
>> Yes, you're right :) For @RequiresStableInput we will always have to
>> buffer and emit only after a finalized checkpoint.
>>
>> 2PC is the better model for Flink, at least in the case of Kafka because
>> it can offload the buffering to Kafka via its transactions.
>> RequiresStableInput is a more general solution and it is feasible to
>> support it in the Flink Runner. However, we have to make sure that
>> checkpoints are taken frequently to avoid too much memory pressure.
>
>
>> It would be nice to also support 2PC in Beam, i.e. the Runner could
>> choose to either buffer/materialize input or do a 2PC, but it would also
>> break the purity of the existing model.
>>
>
> Still digging in to details. I think the "generate random shard ids &
> buffer" is a tradition but more specific to BigQueryIO or FileIO styles. It
> doesn't have to be done that way if the target system has special support
> like Kafka does.
>
> For Kafka, can you get the 2PC behavior like this: Upstream step: open a
> transaction, write a bunch of stuff to it (let Kafka do the buffering) and
> emit a transaction identifier. Downstream @RequiresStableInput step: close
> transaction. Again, I may be totally missing something, but I think that
> this has identical characteristics:
>

Does Kafka garbage collect this eventually in the case where you crash and
start again  with a different transaction identifier?


>  - Kafka does the buffering
>  - checkpoint finalization is the driver of latency
>  - failure before checkpoint finalization means the old transaction sits
> around and times out eventually
>  - failure after checkpoint finalization causes retry with the same
> transaction identifier
>
> Kenn
>
>
>>
>> On 01.03.19 19:42, Kenneth Knowles wrote:
>> > I think I am fundamentally misunderstanding checkpointing in Flink.
>> >
>> > If you randomly generate shard ids, buffer those until finalization,
>> > finalize a checkpoint so that you never need to re-run that generation,
>> > isn't the result stable from that point onwards?
>> >
>> > Kenn
>> >
>> > On Fri, Mar 1, 2019 at 10:30 AM Maximilian Michels > > > wrote:
>> >
>> > Fully agree. I think we can improve the situation drastically. For
>> > KafkaIO EOS with Flink we need to make these two changes:
>> >
>> > 1) Introduce buffering while the checkpoint is being taken
>> > 2) Replace the random shard id assignment with something
>> deterministic
>> >
>> > However, we won't be able to provide full compatibility with
>> > RequiresStableInput because Flink only guarantees stable input
>> after a
>> > checkpoint. RequiresStableInput requires input at any point in time
>> to
>> > be stable. IMHO the only way to achieve that is materializing output
>> > which Flink does not currently support.
>> >
>> > KafkaIO does not need all the power of RequiresStableInput to
>> achieve
>> > EOS with Flink, but for the general case I don't see a good
>> solution at
>> > the moment.
>> >
>> > -Max
>> >
>> > On 01.03.19 16:45, Reuven Lax wrote:
>> >  > Yeah, the person who was working on it originally stopped
>> working on
>> >  > Beam, and nobody else ever finished it. I think it is important
>> to
>> >  > finish though. Many of the existing Sinks are only fully correct
>> for
>> >  > Dataflow today, because they generate either Reshuffle or
>> > GroupByKey to
>> >  > ensure input stability before outputting (in many cases this code
>> > was
>> >  > inherited from before Beam existed). On Flink today, these sinks
>> > might
>> >  > occasionally produce duplicate output in the case of failures.
>> >  >
>> >  > Reuven
>> >  >
>> >  > On Fri, Mar 1, 2019 at 7:18 AM Maximilian Michels <
>> m...@apache.org
>> > 
>> >  > >> wrote:
>> >  >
>> >  > Circling back to the RequiresStableInput annotation[1]. I've
>> > done some
>> >  > protoyping to see how this could be integrated into Flink.
>> I'm
>> >  > currently
>> >  > writing a test based on RequiresStableInput.
>> >  >
>> >  > I found out there are already checks in place at the Runners
>> to
>> >  > throw in
>> >  > case transforms use RequiresStableInput and its not
>> > supported. However,
>> >  > not a single transform actually uses the annotation.
>> >  >
>> >  > It seems that the effort stopped at some point? Would it make
>> > sense to
>> >  > start annotating KafkaExactlyOnceSink with
>> > 

Apache Beam Newsletter - February/March 2019

2019-03-04 Thread Rose Nguyen
[image: Beam.png]

February-March 2019 | Newsletter

What’s been done

--

Apache Beam 2.10.0 released (by: many contributors)

   -

   Download the release here.
   
   -

   See the blog post
    for more
   details.


Apache Beam awarded the 2019 Technology of the Year Award!

   -

   InfoWorld just awarded Beam the 2019 Technology of the Year Award.
   -

   See this  article
   

   for more details.


Kettle Beam 0.5 released with support for flink (by: Matt Casters)

   -

   Kettle now supports Apache Flink as well as Cloud Dataflow and Spark.
   -

   See Matt’s Blog
   

   for more details.



What we’re working on...

--

Apache Beam 2.11.0 release (by: many contributors)


Hive Metastore Table provider for SQL (by: Anton Kedin)

   -

   Support for plugging table providers through Beam SQL API to allow
   obtaining table schemas from external sources.
   -

   See the PR  for more details.


User Defined Coders for the Beam Go SDK (by: Robert Burke)

   -

   Working on expanding the variety of user defined types that can be a
   member of a PCollection in the Go SDK.
   -

   See BEAM-3306  for more
   details.


Python 3 (by: Ahmet Altay, Robert Bradshaw, Charles Chen, Mark Liu, Robbe
Sneyders, Juta Staes, Valentyn Tymofieiev)

   -

   Beam 2.11.0 is the first release offering partial Python 3 support.
   -

   Many thanks to all contributors who helped to reach this milestone.
   -

   IO availablility on Python 3 is currently limited and only Python 3.5
   version has been tested extensively.
   -

   Stay tuned on BEAM-1251 for more details.


Notebooks for quickstarts and custom I/O (by: David Cavazos)

   -

   Adding IPython notebooks and snippets
   -

   See [BEAM-6557]  for more
   details.




 New members
--

New PMC member!

   -

   Etienne Chauchot, Nantes, France


New Committers!

   -

   Gleb Kanterov, Stockholm, Sweden
   -

   Michael Luckey


New Contributors!

   -

   Kyle Weaver, San Francisco, CA
   -

  Would like to help begin implementing portability support for the
  Spark runner
  -

   Tanay Tummapalli, Delhi, India
   -

  Would like to contribute to Open Source this summer as part of Google
  Summer of Code
  -

   Brian Hulette, Seattle, WA
   -

  Contributing to Beam Portability
  -

   Michał Walenia, Warsaw, Poland
   -

  Working on integration and load testing
  -

   Daniel Chen, San Francisco, CA
   -

  Working on Beam Samza runner



 Talks & meetups
--


Plugin Machine Intelligence and Apache Beam with Pentaho - Feb 7 @ London

   -

   Watch the How to Run Kettle on Apache Beam video here
   
.

   -

   See event details here
   ..


Beam @Lyft / Streaming, TensorFlow and use-cases - Feb 7 @ San Francisco, CA

   -

   Organized by Thomas Weise and Austin Bennet, with speakers Tyler Akidau,
   Robert Crowe, Thomas Weise and Amar Pai
   -

   See event details here
    and
   the slides for these presentation: Overview of Apache Beam and
   TensorFlow Transform (TFX) with Apache Beam
   , Python Streaming Pipelines
   with Beam on Flink
, Dynamic
   pricing of Lyft rides using streaming
   


.

Flink meetup - Feb 21@ Seattle, WA

   -

   Speakers from Alibaba, Google, and Uber gave talks about Apache Flink
   with Hive, Tensorflow, Beam, and AthenaX.
   -

   See event details here
    and
   presentations here .


Beam Summit Europe 2019 - June 19-20 @ Berlin

   -

   Beam Summit Europe 2019 will take place in Berlin on June 19-20.
   -

   Speaker CfP and other details to follow soon!
   -

   Twitter announcement!
   



 Resources
--

Apache Jira Beginner’s Guide (by:  Daniel Oliveira)

   -

   A guide
   

   to introduce Beam contributors to the basics of using 

Re: KafkaIO Exactly-Once & Flink Runner

2019-03-04 Thread Kenneth Knowles
On Mon, Mar 4, 2019 at 7:16 AM Maximilian Michels  wrote:

> > If you randomly generate shard ids, buffer those until finalization,
> finalize a checkpoint so that you never need to re-run that generation,
> isn't the result stable from that point onwards?
>
> Yes, you're right :) For @RequiresStableInput we will always have to
> buffer and emit only after a finalized checkpoint.
>
> 2PC is the better model for Flink, at least in the case of Kafka because
> it can offload the buffering to Kafka via its transactions.
> RequiresStableInput is a more general solution and it is feasible to
> support it in the Flink Runner. However, we have to make sure that
> checkpoints are taken frequently to avoid too much memory pressure.


> It would be nice to also support 2PC in Beam, i.e. the Runner could
> choose to either buffer/materialize input or do a 2PC, but it would also
> break the purity of the existing model.
>

Still digging in to details. I think the "generate random shard ids &
buffer" is a tradition but more specific to BigQueryIO or FileIO styles. It
doesn't have to be done that way if the target system has special support
like Kafka does.

For Kafka, can you get the 2PC behavior like this: Upstream step: open a
transaction, write a bunch of stuff to it (let Kafka do the buffering) and
emit a transaction identifier. Downstream @RequiresStableInput step: close
transaction. Again, I may be totally missing something, but I think that
this has identical characteristics:

 - Kafka does the buffering
 - checkpoint finalization is the driver of latency
 - failure before checkpoint finalization means the old transaction sits
around and times out eventually
 - failure after checkpoint finalization causes retry with the same
transaction identifier

Kenn


>
> On 01.03.19 19:42, Kenneth Knowles wrote:
> > I think I am fundamentally misunderstanding checkpointing in Flink.
> >
> > If you randomly generate shard ids, buffer those until finalization,
> > finalize a checkpoint so that you never need to re-run that generation,
> > isn't the result stable from that point onwards?
> >
> > Kenn
> >
> > On Fri, Mar 1, 2019 at 10:30 AM Maximilian Michels  > > wrote:
> >
> > Fully agree. I think we can improve the situation drastically. For
> > KafkaIO EOS with Flink we need to make these two changes:
> >
> > 1) Introduce buffering while the checkpoint is being taken
> > 2) Replace the random shard id assignment with something
> deterministic
> >
> > However, we won't be able to provide full compatibility with
> > RequiresStableInput because Flink only guarantees stable input after
> a
> > checkpoint. RequiresStableInput requires input at any point in time
> to
> > be stable. IMHO the only way to achieve that is materializing output
> > which Flink does not currently support.
> >
> > KafkaIO does not need all the power of RequiresStableInput to achieve
> > EOS with Flink, but for the general case I don't see a good solution
> at
> > the moment.
> >
> > -Max
> >
> > On 01.03.19 16:45, Reuven Lax wrote:
> >  > Yeah, the person who was working on it originally stopped working
> on
> >  > Beam, and nobody else ever finished it. I think it is important to
> >  > finish though. Many of the existing Sinks are only fully correct
> for
> >  > Dataflow today, because they generate either Reshuffle or
> > GroupByKey to
> >  > ensure input stability before outputting (in many cases this code
> > was
> >  > inherited from before Beam existed). On Flink today, these sinks
> > might
> >  > occasionally produce duplicate output in the case of failures.
> >  >
> >  > Reuven
> >  >
> >  > On Fri, Mar 1, 2019 at 7:18 AM Maximilian Michels  > 
> >  > >> wrote:
> >  >
> >  > Circling back to the RequiresStableInput annotation[1]. I've
> > done some
> >  > protoyping to see how this could be integrated into Flink. I'm
> >  > currently
> >  > writing a test based on RequiresStableInput.
> >  >
> >  > I found out there are already checks in place at the Runners
> to
> >  > throw in
> >  > case transforms use RequiresStableInput and its not
> > supported. However,
> >  > not a single transform actually uses the annotation.
> >  >
> >  > It seems that the effort stopped at some point? Would it make
> > sense to
> >  > start annotating KafkaExactlyOnceSink with
> > @RequiresStableInput? We
> >  > could then get rid of the whitelist.
> >  >
> >  > -Max
> >  >
> >  > [1]
> >  >
> >
> https://docs.google.com/document/d/117yRKbbcEdm3eIKB_26BHOJGmHSZl1YNoF0RqWGtqAM
> >  >
> >  >
> >  >
> >  > On 01.03.19 14:28, Maximilian Michels wrote:
> >  >  > Just realized that 

Re: Website tests strangely broken

2019-03-04 Thread Pablo Estrada
Fair enough. Thanks for your feedback. I'll try to look into that.
Best
-P.

On Mon, Mar 4, 2019 at 2:30 AM Maximilian Michels  wrote:

> JIRA is quite flaky and drops connections occasionally. Since we check
> hundreds of links in a row, the failures are not surprising.
>
> Let's check if we can introduce more retries. I'll try to talk with
> INFRA to see if they can improve JIRA performance.
>
> Thanks,
> Max
>
> On 02.03.19 00:54, Ruoyun Huang wrote:
> > The log says running on 880 external links, but only showing 100ish (and
> > the number varies across different run) failure messages.  Likely it is
> > just flakey due to connection not stable on JIRA's site?
> >
> > Not sure how our tests are organized, but maybe re-try http request
> helps?
> >
> > On Fri, Mar 1, 2019 at 12:46 PM Pablo Estrada  > > wrote:
> >
> > Hello all,
> > the website tests are broken. I've filed BEAM-6760
> >  to track fixing
> > them, but I wanted to see if anyone has any idea about why it may be
> > failing.
> >
> > It's been broken for a few days:
> > https://builds.apache.org/job/beam_PreCommit_Website_Cron/
> >
> > And looking at the failures, it seems that they represent broken
> links:
> >
> https://builds.apache.org/job/beam_PreCommit_Website_Cron/725/console
> >
> > But looking at each of the links opens their website without
> problems.
> >
> > It may be some environmental temporary issue, but why would it fail
> > consistently for the last few days then?
> > Thoughts?
> > Thanks
> > -P.
> >
> >
> >
> > --
> > 
> > Ruoyun  Huang
> >
>


Re: KafkaIO Exactly-Once & Flink Runner

2019-03-04 Thread Reuven Lax
On Mon, Mar 4, 2019 at 6:55 AM Maximilian Michels  wrote:

> > Can we do 2? I seem to remember that we had trouble in some cases (e..g
> in the BigQuery case, there was no obvious way to create a deterministic
> id, which is why we went for a random number followed by a reshuffle). Also
> remember that the user ParDo that is producing data to the sink is not
> guaranteed to be deterministic; the Beam model allows for non-deterministic
> transforms.
>
> I believe we could use something like the worker id to make it
> deterministic, though the worker id can change after a restart. We could
> persist it in Flink's operator state. I do not know if we can come up
> with a Runner-independent solution.
>

If we did this, we would break it on runners that don't have a concept of a
stable worker id :( The Dataflow runner can load balance work at any time
(including moving work around between workers).

>
> > I'm not quite sure I understand. If a ParDo is marked with
> RequiresStableInput, can't the flink runner buffer the input message until
> after the checkpoint is complete and only then deliver it to the ParDo?
>
> You're correct. I thought that it could suffice to only buffer during a
> checkpoint and otherwise rely on the deterministic execution of the
> pipeline and KafkaIO's de-duplication code.
>

Yes, I want to distinguish the KafkaIO case from the general case. It would
be interesting to see if there's something we could add to the Beam model
that would create a better story for Kafka's EOS writes.

>
> In any case, emitting only after finalization of checkpoints gives us
> guaranteed stable input. It also means that the processing is tight to
> the checkpoint interval, the checkpoint duration, and the available memory.
>

This is true, however isn't it already true for such uses of Flink?


>
> On 01.03.19 19:41, Reuven Lax wrote:
> >
> >
> > On Fri, Mar 1, 2019 at 10:37 AM Maximilian Michels  > > wrote:
> >
> > Fully agree. I think we can improve the situation drastically. For
> > KafkaIO EOS with Flink we need to make these two changes:
> >
> > 1) Introduce buffering while the checkpoint is being taken
> > 2) Replace the random shard id assignment with something
> deterministic
> >
> >
> > Can we do 2? I seem to remember that we had trouble in some cases (e..g
> > in the BigQuery case, there was no obvious way to create a deterministic
> > id, which is why we went for a random number followed by a reshuffle).
> > Also remember that the user ParDo that is producing data to the sink is
> > not guaranteed to be deterministic; the Beam model allows for
> > non-deterministic transforms.
> >
> >
> > However, we won't be able to provide full compatibility with
> > RequiresStableInput because Flink only guarantees stable input after
> a
> > checkpoint. RequiresStableInput requires input at any point in time
> to
> > be stable.
> >
> >
> > I'm not quite sure I understand. If a ParDo is marked with
> > RequiresStableInput, can't the flink runner buffer the input message
> > until after the checkpoint is complete and only then deliver it to the
> > ParDo? This adds latency of course, but I'm not sure how else to do
> > things correctly with the Beam model.
> >
> > IMHO the only way to achieve that is materializing output
> > which Flink does not currently support.
> >
> > KafkaIO does not need all the power of RequiresStableInput to achieve
> > EOS with Flink, but for the general case I don't see a good solution
> at
> > the moment.
> >
> > -Max
> >
> > On 01.03.19 16:45, Reuven Lax wrote:
> >  > Yeah, the person who was working on it originally stopped working
> on
> >  > Beam, and nobody else ever finished it. I think it is important to
> >  > finish though. Many of the existing Sinks are only fully correct
> for
> >  > Dataflow today, because they generate either Reshuffle or
> > GroupByKey to
> >  > ensure input stability before outputting (in many cases this code
> > was
> >  > inherited from before Beam existed). On Flink today, these sinks
> > might
> >  > occasionally produce duplicate output in the case of failures.
> >  >
> >  > Reuven
> >  >
> >  > On Fri, Mar 1, 2019 at 7:18 AM Maximilian Michels  > 
> >  > >> wrote:
> >  >
> >  > Circling back to the RequiresStableInput annotation[1]. I've
> > done some
> >  > protoyping to see how this could be integrated into Flink. I'm
> >  > currently
> >  > writing a test based on RequiresStableInput.
> >  >
> >  > I found out there are already checks in place at the Runners
> to
> >  > throw in
> >  > case transforms use RequiresStableInput and its not
> > supported. However,
> >  > not a single transform actually uses the annotation.
> >  >
> >  > It 

Re: Merge of vendored Guava (Some PRs need a rebase)

2019-03-04 Thread Ismaël Mejía
That looks interesting but I am not sure if I understand correctly,
isn't the problem that the system API (Bigtable, Cassandra, etc)
exposes guava related stuff? Or in other words, wouldn't the
transitivie version of guava leak anyway?
If it does not I am pretty interested on doing this to fix the
Cassandra IO from leaking too.
https://issues.apache.org/jira/browse/BEAM-5723

On Thu, Feb 28, 2019 at 5:17 PM Kenneth Knowles  wrote:
>
> If someone is using BigTableIO with bigtable-client-core then having 
> BigTableIO and bigtable-client-core both depend on Guava 26.0 is fine, right? 
> Specifically, a user of BigTableIO after 
> https://github.com/apache/beam/pull/7957 will still have non-vendored Guava 
> on the classpath due to the transitive deps of bigtable-client-core.
>
> In any case it seems very wrong for the Beam root project to manage the 
> version of Guava in BigTableIO since the whole point is to be compatible with 
> bigtable-client-core. Would it work to delete our pinned Guava version [1] 
> and chase down all the places it breaks, moving Guava dependencies local to 
> places where an IO or extension must use it for interop? Then you don't need 
> adapters.
>
> In both of the above approaches, diamond dependency problems between IOs are 
> quite possible.
>
> I don't know if we can do better. For example, producing a 
> bigtable-client-core where we have relocated Guava internally and using that 
> could really be an interop nightmare as things that look like the same type 
> would not be. Less likely to be broken would be bigtable-client-core entirely 
> relocated and vendored, but generally IO connectors exchange objects with 
> users and the users would have to use the relocated versions, so that's gross.
>
> Kenn
>
> [1] 
> https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L353
>
>
> On Thu, Feb 28, 2019 at 2:29 AM Gleb Kanterov  wrote:
>>
>> For the past week, two independent people have asked me if I can help with 
>> guava NoSuchMethodError in BigtableIO. It turns out we still have a 
>> potential problem with dependencies that don't vendor guava, in this case, 
>> it was bigtable-client-core that depends on guava-26.0. However, the root 
>> cause of the classpath problem was in the usage of a deprecated method from 
>> non-vendored guava in BigtableServiceClientImpl in the code path where we 
>> integrate with bigtable client.
>>
>> I created apache/beam#7957 [1] to fix that. There few other IO-s where we 
>> use non-vendored guava that we can fix using adapters.
>>
>> And there is an unknown number of conflicts between guava versions in our 
>> dependencies that don't vendor it, that as I understand it, could be fixed 
>> by relocating them, in a similar way we do for Calcite [2].
>>
>> [1]: https://github.com/apache/beam/pull/7957
>> [2]: 
>> https://github.com/apache/beam/blob/61de62ecbe8658de866280a8976030a0cb877041/sdks/java/extensions/sql/build.gradle#L30-L39
>>
>> Gleb
>>
>> On Sun, Jan 20, 2019 at 11:43 AM Gleb Kanterov  wrote:
>>>
>>> I didn't look deep into it, but it seems we can put 
>>> .idea/codeInsightSettings.xml into our repository where we blacklist 
>>> packages from auto-import. See an example in 
>>> JetBrains/kotlin/.idea/codeInsightSettings.xml.
>>>
>>> On Sat, Jan 19, 2019 at 8:03 PM Reuven Lax  wrote:

 Bad IDEs automatically generate the wrong import. I think we need to 
 automatically prevent this, otherwise the bad imports will inevitably slip 
 back in.

 Reuven

 On Tue, Jan 15, 2019 at 2:54 AM Łukasz Gajowy  
 wrote:
>
> Great news. Thanks all for this work!
>
> +1 to enforcing this on dependency level as Kenn suggested.
>
> Łukasz
>
> wt., 15 sty 2019 o 01:18 Kenneth Knowles  napisał(a):
>>
>> We can enforce at the dependency level, since it is a compile error. I 
>> think some IDEs and build tools may allow the compile-time classpath to 
>> get polluted by transitive runtime deps, so protecting against bad 
>> imports is also a good idea.
>>
>> Kenn
>>
>> On Mon, Jan 14, 2019 at 8:42 AM Ismaël Mejía  wrote:
>>>
>>> Not yet, we need to add that too, there are still some tasks to be
>>> done like improve the contribution guide with this info, and document
>>> how to  generate a src build artifact locally since I doubt we can
>>> publish that into Apache for copyright reasons.
>>> I will message in the future for awareness for awareness when most of
>>> the pending tasks are finished.
>>>
>>>
>>> On Mon, Jan 14, 2019 at 3:51 PM Maximilian Michels  
>>> wrote:
>>> >
>>> > Thanks for the heads up, Ismaël! Great to see the vendored Guava 
>>> > version is used
>>> > everywhere now.
>>> >
>>> > Do we already have a Checkstyle rule that prevents people from using 
>>> > the
>>> > unvendored Guava? If not, such a rule 

Re: KafkaIO Exactly-Once & Flink Runner

2019-03-04 Thread Maximilian Michels

If you randomly generate shard ids, buffer those until finalization, finalize a 
checkpoint so that you never need to re-run that generation, isn't the result 
stable from that point onwards?


Yes, you're right :) For @RequiresStableInput we will always have to 
buffer and emit only after a finalized checkpoint.


2PC is the better model for Flink, at least in the case of Kafka because 
it can offload the buffering to Kafka via its transactions. 
RequiresStableInput is a more general solution and it is feasible to 
support it in the Flink Runner. However, we have to make sure that 
checkpoints are taken frequently to avoid too much memory pressure.


It would be nice to also support 2PC in Beam, i.e. the Runner could 
choose to either buffer/materialize input or do a 2PC, but it would also 
break the purity of the existing model.


On 01.03.19 19:42, Kenneth Knowles wrote:

I think I am fundamentally misunderstanding checkpointing in Flink.

If you randomly generate shard ids, buffer those until finalization, 
finalize a checkpoint so that you never need to re-run that generation, 
isn't the result stable from that point onwards?


Kenn

On Fri, Mar 1, 2019 at 10:30 AM Maximilian Michels > wrote:


Fully agree. I think we can improve the situation drastically. For
KafkaIO EOS with Flink we need to make these two changes:

1) Introduce buffering while the checkpoint is being taken
2) Replace the random shard id assignment with something deterministic

However, we won't be able to provide full compatibility with
RequiresStableInput because Flink only guarantees stable input after a
checkpoint. RequiresStableInput requires input at any point in time to
be stable. IMHO the only way to achieve that is materializing output
which Flink does not currently support.

KafkaIO does not need all the power of RequiresStableInput to achieve
EOS with Flink, but for the general case I don't see a good solution at
the moment.

-Max

On 01.03.19 16:45, Reuven Lax wrote:
 > Yeah, the person who was working on it originally stopped working on
 > Beam, and nobody else ever finished it. I think it is important to
 > finish though. Many of the existing Sinks are only fully correct for
 > Dataflow today, because they generate either Reshuffle or
GroupByKey to
 > ensure input stability before outputting (in many cases this code
was
 > inherited from before Beam existed). On Flink today, these sinks
might
 > occasionally produce duplicate output in the case of failures.
 >
 > Reuven
 >
 > On Fri, Mar 1, 2019 at 7:18 AM Maximilian Michels mailto:m...@apache.org>
 > >> wrote:
 >
 >     Circling back to the RequiresStableInput annotation[1]. I've
done some
 >     protoyping to see how this could be integrated into Flink. I'm
 >     currently
 >     writing a test based on RequiresStableInput.
 >
 >     I found out there are already checks in place at the Runners to
 >     throw in
 >     case transforms use RequiresStableInput and its not
supported. However,
 >     not a single transform actually uses the annotation.
 >
 >     It seems that the effort stopped at some point? Would it make
sense to
 >     start annotating KafkaExactlyOnceSink with
@RequiresStableInput? We
 >     could then get rid of the whitelist.
 >
 >     -Max
 >
 >     [1]
 >

https://docs.google.com/document/d/117yRKbbcEdm3eIKB_26BHOJGmHSZl1YNoF0RqWGtqAM
 >
 >
 >
 >     On 01.03.19 14:28, Maximilian Michels wrote:
 >      > Just realized that transactions do not spawn multiple
elements in
 >      > KafkaExactlyOnceSink. So the proposed solution to stop
processing
 >      > elements while a snapshot is pending would work.
 >      >
 >      > It is certainly not optimal in terms of performance for
Flink and
 >     poses
 >      > problems when checkpoints take long to complete, but it
would be
 >      > worthwhile to implement this to make use of the EOS feature.
 >      >
 >      > Thanks,
 >      > Max
 >      >
 >      > On 01.03.19 12:23, Maximilian Michels wrote:
 >      >> Thanks you for the prompt replies. It's great to see that
there is
 >      >> good understanding of how EOS in Flink works.
 >      >>
 >      >>> This is exactly what RequiresStableInput is supposed to
do. On the
 >      >>> Flink runner, this would be implemented by delaying
processing
 >     until
 >      >>> the current checkpoint is done.
 >      >>
 >      >> I don't think that works because we have no control over
the Kafka
 >      >> transactions. Imagine:
 >      >>
 >      >> 1) ExactlyOnceWriter writes records to Kafka and commits,
then
 >     

Re: KafkaIO Exactly-Once & Flink Runner

2019-03-04 Thread Maximilian Michels

Can we do 2? I seem to remember that we had trouble in some cases (e..g in the 
BigQuery case, there was no obvious way to create a deterministic id, which is 
why we went for a random number followed by a reshuffle). Also remember that 
the user ParDo that is producing data to the sink is not guaranteed to be 
deterministic; the Beam model allows for non-deterministic transforms.


I believe we could use something like the worker id to make it 
deterministic, though the worker id can change after a restart. We could 
persist it in Flink's operator state. I do not know if we can come up 
with a Runner-independent solution.



I'm not quite sure I understand. If a ParDo is marked with RequiresStableInput, 
can't the flink runner buffer the input message until after the checkpoint is 
complete and only then deliver it to the ParDo?


You're correct. I thought that it could suffice to only buffer during a 
checkpoint and otherwise rely on the deterministic execution of the 
pipeline and KafkaIO's de-duplication code.


In any case, emitting only after finalization of checkpoints gives us 
guaranteed stable input. It also means that the processing is tight to 
the checkpoint interval, the checkpoint duration, and the available memory.


On 01.03.19 19:41, Reuven Lax wrote:



On Fri, Mar 1, 2019 at 10:37 AM Maximilian Michels > wrote:


Fully agree. I think we can improve the situation drastically. For
KafkaIO EOS with Flink we need to make these two changes:

1) Introduce buffering while the checkpoint is being taken
2) Replace the random shard id assignment with something deterministic


Can we do 2? I seem to remember that we had trouble in some cases (e..g 
in the BigQuery case, there was no obvious way to create a deterministic 
id, which is why we went for a random number followed by a reshuffle). 
Also remember that the user ParDo that is producing data to the sink is 
not guaranteed to be deterministic; the Beam model allows for 
non-deterministic transforms.



However, we won't be able to provide full compatibility with
RequiresStableInput because Flink only guarantees stable input after a
checkpoint. RequiresStableInput requires input at any point in time to
be stable. 



I'm not quite sure I understand. If a ParDo is marked with 
RequiresStableInput, can't the flink runner buffer the input message 
until after the checkpoint is complete and only then deliver it to the 
ParDo? This adds latency of course, but I'm not sure how else to do 
things correctly with the Beam model.


IMHO the only way to achieve that is materializing output
which Flink does not currently support.

KafkaIO does not need all the power of RequiresStableInput to achieve
EOS with Flink, but for the general case I don't see a good solution at
the moment.

-Max

On 01.03.19 16:45, Reuven Lax wrote:
 > Yeah, the person who was working on it originally stopped working on
 > Beam, and nobody else ever finished it. I think it is important to
 > finish though. Many of the existing Sinks are only fully correct for
 > Dataflow today, because they generate either Reshuffle or
GroupByKey to
 > ensure input stability before outputting (in many cases this code
was
 > inherited from before Beam existed). On Flink today, these sinks
might
 > occasionally produce duplicate output in the case of failures.
 >
 > Reuven
 >
 > On Fri, Mar 1, 2019 at 7:18 AM Maximilian Michels mailto:m...@apache.org>
 > >> wrote:
 >
 >     Circling back to the RequiresStableInput annotation[1]. I've
done some
 >     protoyping to see how this could be integrated into Flink. I'm
 >     currently
 >     writing a test based on RequiresStableInput.
 >
 >     I found out there are already checks in place at the Runners to
 >     throw in
 >     case transforms use RequiresStableInput and its not
supported. However,
 >     not a single transform actually uses the annotation.
 >
 >     It seems that the effort stopped at some point? Would it make
sense to
 >     start annotating KafkaExactlyOnceSink with
@RequiresStableInput? We
 >     could then get rid of the whitelist.
 >
 >     -Max
 >
 >     [1]
 >

https://docs.google.com/document/d/117yRKbbcEdm3eIKB_26BHOJGmHSZl1YNoF0RqWGtqAM
 >
 >
 >
 >     On 01.03.19 14:28, Maximilian Michels wrote:
 >      > Just realized that transactions do not spawn multiple
elements in
 >      > KafkaExactlyOnceSink. So the proposed solution to stop
processing
 >      > elements while a snapshot is pending would work.
 >      >
 >      > It is certainly not optimal in terms of performance for
Flink and
 >     poses
 >      > problems when checkpoints take long to complete, but it
would be
 > 

Beam Dependency Check Report (2019-03-04)

2019-03-04 Thread Apache Jenkins Server

High Priority Dependency Updates Of Beam Python SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
future
0.16.0
0.17.1
2016-10-27
2018-12-10BEAM-5968
oauth2client
3.0.0
4.1.3
2018-12-10
2018-12-10BEAM-6089
High Priority Dependency Updates Of Beam Java SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
com.rabbitmq:amqp-client
4.6.0
5.6.0
2018-03-26
2019-01-25BEAM-5895
org.apache.rat:apache-rat-tasks
0.12
0.13
2016-06-07
2018-10-13BEAM-6039
com.google.auto.service:auto-service
1.0-rc2
1.0-rc4
2014-10-25
2017-12-11BEAM-5541
com.amazonaws:aws-java-sdk-kinesis
1.11.255
1.11.510
2017-12-23
2019-03-01BEAM-6330
com.github.ben-manes.versions:com.github.ben-manes.versions.gradle.plugin
0.17.0
0.21.0
2019-02-11
2019-03-04BEAM-6645
org.conscrypt:conscrypt-openjdk
1.1.3
2.0.0
2018-06-04
2019-02-13BEAM-5748
org.elasticsearch:elasticsearch
6.4.0
7.0.0-beta1
2018-08-18
2019-02-13BEAM-6090
org.elasticsearch:elasticsearch-hadoop
5.0.0
7.0.0-beta1
2016-10-26
2019-02-13BEAM-5551
org.elasticsearch.client:elasticsearch-rest-client
6.4.0
7.0.0-beta1
2018-08-18
2019-02-13BEAM-6091
com.google.errorprone:error_prone_annotations
2.1.2
2.3.3
2017-10-19
2019-02-22BEAM-6741
org.elasticsearch.test:framework
6.4.0
7.0.0-beta1
2018-08-18
2019-02-13BEAM-6092
io.grpc:grpc-context
1.13.1
1.19.0
2018-06-21
2019-02-27BEAM-5897
io.grpc:grpc-protobuf
1.13.1
1.19.0
2018-06-21
2019-02-27BEAM-5900
io.grpc:grpc-testing
1.13.1
1.19.0
2018-06-21
2019-02-27BEAM-5902
com.google.code.gson:gson
2.7
2.8.5
2016-06-14
2018-05-22BEAM-5558
org.apache.hbase:hbase-common
1.2.6
2.1.3
2017-05-29
2019-02-11BEAM-5560
org.apache.hbase:hbase-hadoop-compat
1.2.6
2.1.3
2017-05-29
2019-02-11BEAM-5561
org.apache.hbase:hbase-hadoop2-compat
1.2.6
2.1.3
2017-05-29
2019-02-11BEAM-5562
org.apache.hbase:hbase-server
1.2.6
2.1.3
2017-05-29
2019-02-11BEAM-5563
org.apache.hbase:hbase-shaded-client
1.2.6
2.1.3
2017-05-29
2019-02-11BEAM-5564
org.apache.hive:hive-cli
2.1.0
3.1.1
2016-06-17
2018-10-24BEAM-5566
org.apache.hive:hive-common
2.1.0
3.1.1
2016-06-17
2018-10-24BEAM-5567
org.apache.hive:hive-exec
2.1.0
3.1.1
2016-06-17
2018-10-24BEAM-5568
org.apache.hive.hcatalog:hive-hcatalog-core
2.1.0
3.1.1
2016-06-17
2018-10-24BEAM-5569
net.java.dev.javacc:javacc
4.0
7.0.4
2006-03-17
2018-09-17BEAM-5570
javax.servlet:javax.servlet-api
3.1.0
4.0.1
2013-04-25
2018-04-20BEAM-5750
redis.clients:jedis
2.9.0
3.0.1
2016-07-22
2018-12-27BEAM-6125
org.eclipse.jetty:jetty-server
9.2.10.v20150310
9.4.15.v20190215
2015-03-10
2019-02-15BEAM-5752
org.eclipse.jetty:jetty-servlet
9.2.10.v20150310
9.4.15.v20190215
2015-03-10
2019-02-15BEAM-5753
net.java.dev.jna:jna
4.1.0
5.2.0
2014-03-06
2018-12-23BEAM-5573
junit:junit
4.13-beta-1
4.13-beta-2
2018-11-25
2019-02-02BEAM-6127
com.esotericsoftware:kryo
4.0.2
5.0.0-RC2
2018-03-20
2019-02-05BEAM-5809
com.esotericsoftware.kryo:kryo
2.21
2.24.0
2013-02-27
2014-05-04BEAM-5574
org.apache.kudu:kudu-client
1.4.0
1.8.0
2017-06-05
2018-10-16BEAM-5575
io.dropwizard.metrics:metrics-core
3.1.2
4.1.0-rc3
2015-04-26
2018-12-30BEAM-5576
io.grpc:protoc-gen-grpc-java
1.13.1
1.19.0
2018-06-21
2019-02-27BEAM-5903
org.apache.qpid:proton-j
0.13.1
0.31.0
2016-07-02
2018-11-23BEAM-5582
   

Re: [VOTE] Release 2.11.0, release candidate #2

2019-03-04 Thread Robert Bradshaw
I see the vote has passed, but +1 (binding) from me as well.

On Mon, Mar 4, 2019 at 11:51 AM Jean-Baptiste Onofré  wrote:
>
> +1 (binding)
>
> Tested with beam-samples.
>
> Regards
> JB
>
> On 26/02/2019 10:40, Ahmet Altay wrote:
> > Hi everyone,
> >
> > Please review and vote on the release candidate #2 for the version
> > 2.11.0, as follows:
> >
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > The complete staging area is available for your review, which includes:
> > * JIRA release notes [1],
> > * the official Apache source release to be deployed to dist.apache.org
> >  [2], which is signed with the key with
> > fingerprint 64B84A5AD91F9C20F5E9D9A7D62E71416096FA00 [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag "v2.11.0-RC2" [5],
> > * website pull request listing the release [6] and publishing the API
> > reference manual [7].
> > * Python artifacts are deployed along with the source release to the
> > dist.apache.org  [2].
> > * Validation sheet with a tab for 2.11.0 release to help with validation
> > [8].
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Thanks,
> > Ahmet
> >
> > [1]
> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344775
> > [2] https://dist.apache.org/repos/dist/dev/beam/2.11.0/
> > [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> > [4] https://repository.apache.org/content/repositories/orgapachebeam-1064/
> > [5] https://github.com/apache/beam/tree/v2.11.0-RC2
> > [6] https://github.com/apache/beam/pull/7924
> > [7] https://github.com/apache/beam-site/pull/587
> > [8]
> > https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=542393513
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com


Re: [VOTE] Release 2.11.0, release candidate #2

2019-03-04 Thread Jean-Baptiste Onofré
+1 (binding)

Tested with beam-samples.

Regards
JB

On 26/02/2019 10:40, Ahmet Altay wrote:
> Hi everyone,
> 
> Please review and vote on the release candidate #2 for the version
> 2.11.0, as follows:
> 
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
> 
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
>  [2], which is signed with the key with
> fingerprint 64B84A5AD91F9C20F5E9D9A7D62E71416096FA00 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.11.0-RC2" [5],
> * website pull request listing the release [6] and publishing the API
> reference manual [7].
> * Python artifacts are deployed along with the source release to the
> dist.apache.org  [2].
> * Validation sheet with a tab for 2.11.0 release to help with validation
> [8].
> 
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
> 
> Thanks,
> Ahmet
> 
> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344775
> [2] https://dist.apache.org/repos/dist/dev/beam/2.11.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1064/
> [5] https://github.com/apache/beam/tree/v2.11.0-RC2
> [6] https://github.com/apache/beam/pull/7924
> [7] https://github.com/apache/beam-site/pull/587
> [8]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=542393513
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Website tests strangely broken

2019-03-04 Thread Maximilian Michels
JIRA is quite flaky and drops connections occasionally. Since we check 
hundreds of links in a row, the failures are not surprising.


Let's check if we can introduce more retries. I'll try to talk with 
INFRA to see if they can improve JIRA performance.


Thanks,
Max

On 02.03.19 00:54, Ruoyun Huang wrote:
The log says running on 880 external links, but only showing 100ish (and 
the number varies across different run) failure messages.  Likely it is 
just flakey due to connection not stable on JIRA's site?


Not sure how our tests are organized, but maybe re-try http request helps?

On Fri, Mar 1, 2019 at 12:46 PM Pablo Estrada > wrote:


Hello all,
the website tests are broken. I've filed BEAM-6760
 to track fixing
them, but I wanted to see if anyone has any idea about why it may be
failing.

It's been broken for a few days:
https://builds.apache.org/job/beam_PreCommit_Website_Cron/

And looking at the failures, it seems that they represent broken links:
https://builds.apache.org/job/beam_PreCommit_Website_Cron/725/console

But looking at each of the links opens their website without problems.

It may be some environmental temporary issue, but why would it fail
consistently for the last few days then?
Thoughts?
Thanks
-P.



--

Ruoyun  Huang



Re: [BEAM-6759] CassandraIOTest failing in presubmit in multiple PRs

2019-03-04 Thread Maximilian Michels

Hey Alex,

This is a duplicate: https://issues.apache.org/jira/browse/BEAM-6722

There is a pending PR, though we haven't agreed on merging it. Would be 
nice to fix this.


Cheers,
Max

On 01.03.19 20:09, Alex Amato wrote:

https://issues.apache.org/jira/browse/BEAM-6759


Hi, I have seen this test failing in presubmit in multiple PRs, which 
does seem to be related to the changes. Any ideas why this is failing at 
the moment?


CassandraIOTest - scans

https://builds.apache.org/job/beam_PreCommit_Java_Commit/4586/testReport/junit/org.apache.beam.sdk.io.cassandra/CassandraIOTest/classMethod/

https://scans.gradle.com/s/btppkeky63a5g/console-log?task=:beam-sdks-java-io-cassandra:test#L7

java.lang.NullPointerException at 
org.cassandraunit.utils.EmbeddedCassandraServerHelper.dropKeyspacesWithNativeDriver(EmbeddedCassandraServerHelper.java:285) 
at 
org.cassandraunit.utils.EmbeddedCassandraServerHelper.dropKeyspaces(EmbeddedCassandraServerHelper.java:281) 
at 
org.cassandraunit.utils.EmbeddedCassandraServerHelper.cleanEmbeddedCassandra(EmbeddedCassandraServerHelper.java:193) 
at 
org.apache.beam.sdk.io.cassandra.CassandraIOTest.stopCassandra(CassandraIOTest.java:129) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) 
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) 
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) 
at 
org.junit.internal.runners.statements.RunAfters.invokeMethod(RunAfters.java:46) 
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) 
at org.junit.runners.ParentRunner.run(ParentRunner.java:396) at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110) 
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) 
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) 
at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62) 
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
at java.lang.reflect.Method.invoke(Method.java:498) at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) 
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) 
at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) 
at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93) 
at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
at java.lang.reflect.Method.invoke(Method.java:498) at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) 
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) 
at 
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:175) 
at 
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:157) 
at 
org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404) 
at 
org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63) 
at 
org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46) 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at 
org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55) 
at java.lang.Thread.run(Thread.java:748)