Re: Hello from a newbie to the data world living in the city by the bay!

2017-08-16 Thread Ismaël Mejía
Hello and welcome Griselda, Umang, Justin

Apart of the links provided by Ahmet you might read Beam-related
material on the website (See Documentation > Programming Guide and
Documentation > Additional Resources among others).

But probably as important as improving your Beam related knowledge is
to understand the principles of an open source project and more
concretely the way the Apache projects work (in case this is your
first Apache project), concepts like How projects are structured
(PMCs, committers, votes, etc) and the most important ones Community
over Code and Meritocracy.

https://www.apache.org/foundation/how-it-works.html
https://blogs.apache.org/foundation/entry/asf_15_community_over_code

Welcome all and don't hesitate to ask questions, we are all here to
make this project better so for sure we can help.
Ismaël


On Tue, Aug 15, 2017 at 11:04 PM, Justin T  wrote:
> Hello Beam community,
>
> I am also a new member, and I feel a little better knowing that there
> others on the same boat:)
>
> My name is Justin and I work as a full stack engineer for Neustar, a
> marketing analytics company in San Diego. Over the past few weeks I have
> been getting more familiar with Beam via documentation, papers, videos, and
> the old email archives and I am very excited to start making contributions.
> Thank you Altay for the useful links!
>
> -Justin Tumale
>
> On Tue, Aug 15, 2017 at 11:19 AM, Ahmet Altay 
> wrote:
>
>> Welcome both of you!
>>
>> Some helpful starting points:
>> - Contribution guide [1]
>> - Unassigned starter issues in JIRA [2]
>>
>> Ahmet
>>
>> [1] https://beam.apache.org/contribute/contribution-guide/
>> [2]
>> https://issues.apache.org/jira/browse/BEAM-2632?jql=
>> project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20Reopened)%20AND%
>> 20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20starter%20AND%
>> 20assignee%20in%20(EMPTY)%20ORDER%20BY%20created%20DESC%
>> 2C%20priority%20DESC
>>
>> On Tue, Aug 15, 2017 at 11:13 AM, Umang Sharma 
>> wrote:
>>
>> > Hi Gris,
>> > Nice to meet you.
>> >
>> > I'd like to take this opportunity to introduce me to you and everyone
>> else
>> > in  the dev team.
>> >
>> > I’m m Umang Sharma. I'm an associate in Data Science and Applications at
>> > Accenture Digital.
>> >
>> >
>> > I write in python, Java and a number of other languages.
>> > I'd love to contribute to Beam. It'd br great if someone guides me to get
>> > started with contributing :)
>> >
>> > Among the other things i like are polo golf, giving talks and talking
>> about
>> > mu work .
>> >
>> > Thanks,
>> > Umang
>> >
>> >
>> > On Aug 15, 2017 22:40, "Griselda Cuevas" 
>> wrote:
>> >
>> > Hi Beam community,
>> >
>> > I’m Griselda (Gris) Cuevas and I’m very excited to join the community,
>> I’m
>> > looking forward to learning awesome things from you and to getting the
>> > chance to collaborate on great initiatives.
>> >
>> > I’m currently working at Google and I’m studying a masters in operations
>> > research and data science at UC Berkeley. I’m interested in Natural
>> > Language Processing, Information Retrieval and Online Communities. Some
>> > other fun topics I love are juggling, camping and -just getting into it-
>> >  listening to podcasts, so if you ever want to discuss and talk about any
>> > of these topics, here I am!
>> >
>> > Another reason why I’m here is because I want to help this project grow
>> and
>> > thrive. This means that you’ll see me contributing to the project,
>> reaching
>> > out to ask questions as I get familiar with our community, and I also
>> > helping evangelize Apache Beam by organizing meetups, hangouts, etc.
>> >
>> > I say bye for now, I’ll see you around,
>> >
>> > Cheers,
>> >
>> > G
>> >
>>


Re: Hello from a newbie to the data world living in the city by the bay!

2017-08-16 Thread Jean-Baptiste Onofré
Welcome !

Regards
JB

On Aug 16, 2017, 08:54, at 08:54, "Ismaël Mejía"  wrote:
>Hello and welcome Griselda, Umang, Justin
>
>Apart of the links provided by Ahmet you might read Beam-related
>material on the website (See Documentation > Programming Guide and
>Documentation > Additional Resources among others).
>
>But probably as important as improving your Beam related knowledge is
>to understand the principles of an open source project and more
>concretely the way the Apache projects work (in case this is your
>first Apache project), concepts like How projects are structured
>(PMCs, committers, votes, etc) and the most important ones Community
>over Code and Meritocracy.
>
>https://www.apache.org/foundation/how-it-works.html
>https://blogs.apache.org/foundation/entry/asf_15_community_over_code
>
>Welcome all and don't hesitate to ask questions, we are all here to
>make this project better so for sure we can help.
>Ismaël
>
>
>On Tue, Aug 15, 2017 at 11:04 PM, Justin T  wrote:
>> Hello Beam community,
>>
>> I am also a new member, and I feel a little better knowing that there
>> others on the same boat:)
>>
>> My name is Justin and I work as a full stack engineer for Neustar, a
>> marketing analytics company in San Diego. Over the past few weeks I
>have
>> been getting more familiar with Beam via documentation, papers,
>videos, and
>> the old email archives and I am very excited to start making
>contributions.
>> Thank you Altay for the useful links!
>>
>> -Justin Tumale
>>
>> On Tue, Aug 15, 2017 at 11:19 AM, Ahmet Altay
>
>> wrote:
>>
>>> Welcome both of you!
>>>
>>> Some helpful starting points:
>>> - Contribution guide [1]
>>> - Unassigned starter issues in JIRA [2]
>>>
>>> Ahmet
>>>
>>> [1] https://beam.apache.org/contribute/contribution-guide/
>>> [2]
>>> https://issues.apache.org/jira/browse/BEAM-2632?jql=
>>>
>project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20Reopened)%20AND%
>>>
>20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20starter%20AND%
>>> 20assignee%20in%20(EMPTY)%20ORDER%20BY%20created%20DESC%
>>> 2C%20priority%20DESC
>>>
>>> On Tue, Aug 15, 2017 at 11:13 AM, Umang Sharma 
>>> wrote:
>>>
>>> > Hi Gris,
>>> > Nice to meet you.
>>> >
>>> > I'd like to take this opportunity to introduce me to you and
>everyone
>>> else
>>> > in  the dev team.
>>> >
>>> > I’m m Umang Sharma. I'm an associate in Data Science and
>Applications at
>>> > Accenture Digital.
>>> >
>>> >
>>> > I write in python, Java and a number of other languages.
>>> > I'd love to contribute to Beam. It'd br great if someone guides me
>to get
>>> > started with contributing :)
>>> >
>>> > Among the other things i like are polo golf, giving talks and
>talking
>>> about
>>> > mu work .
>>> >
>>> > Thanks,
>>> > Umang
>>> >
>>> >
>>> > On Aug 15, 2017 22:40, "Griselda Cuevas" 
>>> wrote:
>>> >
>>> > Hi Beam community,
>>> >
>>> > I’m Griselda (Gris) Cuevas and I’m very excited to join the
>community,
>>> I’m
>>> > looking forward to learning awesome things from you and to getting
>the
>>> > chance to collaborate on great initiatives.
>>> >
>>> > I’m currently working at Google and I’m studying a masters in
>operations
>>> > research and data science at UC Berkeley. I’m interested in
>Natural
>>> > Language Processing, Information Retrieval and Online Communities.
>Some
>>> > other fun topics I love are juggling, camping and -just getting
>into it-
>>> >  listening to podcasts, so if you ever want to discuss and talk
>about any
>>> > of these topics, here I am!
>>> >
>>> > Another reason why I’m here is because I want to help this project
>grow
>>> and
>>> > thrive. This means that you’ll see me contributing to the project,
>>> reaching
>>> > out to ask questions as I get familiar with our community, and I
>also
>>> > helping evangelize Apache Beam by organizing meetups, hangouts,
>etc.
>>> >
>>> > I say bye for now, I’ll see you around,
>>> >
>>> > Cheers,
>>> >
>>> > G
>>> >
>>>


Re: [VOTE] Release 2.1.0, release candidate #3

2017-08-16 Thread Kobi Salant
Hi,

Spark runner was tested with word count example and a more complex session
based application on a yarn cluster.
Both application run successfully so we can say that spark runner passed
the sanity tests needed.

Still there is an open ticket
https://issues.apache.org/jira/browse/BEAM-2671 which Stas is working on
and its implications should be taken into consideration regarding the
release.

Regards
Kobi

2017-08-16 5:02 GMT+03:00 Eugene Kirpichov :

> Hey all,
>
> Seems like we're missing one more affirmative vote from a PMC member (so
> far we have JB and Ahmet) to proceed with the release.
>
> On Mon, Aug 14, 2017 at 9:30 AM Ahmet Altay 
> wrote:
>
> > On Mon, Aug 14, 2017 at 6:32 AM, Ismaël Mejía  wrote:
> >
> > > +1 (non-binding)
> > >
> > > - Validated signatures OK
> > > - mvn clean verify -Prelease on both OpenJDK 1.7 and Oracle JDK 8 with
> > > the docker development images (WIP), both OK
> > > - Run WordCount on local Flink and Spark runners OK
> > >
> > > Everything looks nice, only one minor thing (not blocking at all). The
> > > proto generated files for python are not cleaned correctly and this
> > > causes the validation to complain because the maven rat plugin does
> > > not find the apache headers on the files  (this happens if you execute
> > > mvn clean verify -Prelease immediately after the validation).
> > >
> >
> > Ismaël, could you create a JIRA issue for this (to be fixed at a future
> > release)?
> >
> >
> > >
> > > On Sun, Aug 13, 2017 at 6:52 AM, Jean-Baptiste Onofré  >
> > > wrote:
> > > > +1 (binding)
> > > >
> > > > I do my own tests and casting my own vote ;)
> > > >
> > > > Regards
> > > > JB
> > > >
> > > > On 08/09/2017 07:08 AM, Jean-Baptiste Onofré wrote:
> > > >>
> > > >> Hi everyone,
> > > >>
> > > >> Please review and vote on the release candidate #3 for the version
> > > 2.1.0,
> > > >> as follows:
> > > >>
> > > >> [ ] +1, Approve the release
> > > >> [ ] -1, Do not approve the release (please provide specific
> comments)
> > > >>
> > > >>
> > > >> The complete staging area is available for your review, which
> > includes:
> > > >> * JIRA release notes [1],
> > > >> * the official Apache source release to be deployed to
> > dist.apache.org
> > > >> [2], which is signed with the key with fingerprint C8282E76 [3],
> > > >> * all artifacts to be deployed to the Maven Central Repository [4],
> > > >> * source code tag "v2.1.0-RC3" [5],
> > > >> * website pull request listing the release and publishing the API
> > > >> reference manual [6].
> > > >> * Python artifacts are deployed along with the source release to the
> > > >> dist.apache.org [2].
> > > >>
> > > >> The vote will be open for at least 72 hours. It is adopted by
> majority
> > > >> approval, with at least 3 PMC affirmative votes.
> > > >>
> > > >> Thanks,
> > > >> JB
> > > >>
> > > >> [1]
> > > >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> > > projectId=12319527&version=12340528
> > > >> [2] https://dist.apache.org/repos/dist/dev/beam/2.1.0/
> > > >> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> > > >> [4] https://repository.apache.org/content/repositories/
> > > orgapachebeam-1020/
> > > >> [5] https://github.com/apache/beam/tree/v2.1.0-RC3
> > > >> [6] https://github.com/apache/beam-site/pull/270
> > > >
> > > >
> > > > --
> > > > Jean-Baptiste Onofré
> > > > jbono...@apache.org
> > > > http://blog.nanthrax.net
> > > > Talend - http://www.talend.com
> > >
> >
>


Re: Policy for stale PRs

2017-08-16 Thread Aviem Zur
Makes sense to close after a long time of inactivity and no response, and
as Kenn mentioned they can always re-open.

On Wed, Aug 16, 2017 at 12:20 AM Jean-Baptiste Onofré 
wrote:

> If we consider the author, it makes sense.
>
> Regards
> JB
>
> On Aug 15, 2017, 01:29, at 01:29, Ted Yu  wrote:
> >The proposal makes sense.
> >
> >If the author of PR doesn't respond for 90 days, the PR is likely out
> >of
> >sync with current repo.
> >
> >Cheers
> >
> >On Mon, Aug 14, 2017 at 5:27 PM, Ahmet Altay 
> >wrote:
> >
> >> Hi all,
> >>
> >> Do we have an existing policy for handling stale PRs? If not could we
> >come
> >> up with one. We are getting close to 100 open PRs. Some of the open
> >PRs
> >> have not been touched for a while, and if we exclude the pings the
> >number
> >> will be higher.
> >>
> >> For example, we could close PRs that have not been updated by the
> >original
> >> author for 90 days even after multiple attempts to reach them (e.g.
> >[1],
> >> [2] are such PRs.)
> >>
> >> What do you think?
> >>
> >> Thank you,
> >> Ahmet
> >>
> >> [1] https://github.com/apache/beam/pull/1464
> >> [2] https://github.com/apache/beam/pull/2949
> >>
>


Re: [VOTE] Release 2.1.0, release candidate #3

2017-08-16 Thread Lukasz Cwik
Back from vacation.

+1 binding

BEAM-2671 has been marked for 2.2.0 release.



On Wed, Aug 16, 2017 at 2:08 AM, Kobi Salant  wrote:

> Hi,
>
> Spark runner was tested with word count example and a more complex session
> based application on a yarn cluster.
> Both application run successfully so we can say that spark runner passed
> the sanity tests needed.
>
> Still there is an open ticket
> https://issues.apache.org/jira/browse/BEAM-2671 which Stas is working on
> and its implications should be taken into consideration regarding the
> release.
>
> Regards
> Kobi
>
> 2017-08-16 5:02 GMT+03:00 Eugene Kirpichov :
>
> > Hey all,
> >
> > Seems like we're missing one more affirmative vote from a PMC member (so
> > far we have JB and Ahmet) to proceed with the release.
> >
> > On Mon, Aug 14, 2017 at 9:30 AM Ahmet Altay 
> > wrote:
> >
> > > On Mon, Aug 14, 2017 at 6:32 AM, Ismaël Mejía 
> wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > - Validated signatures OK
> > > > - mvn clean verify -Prelease on both OpenJDK 1.7 and Oracle JDK 8
> with
> > > > the docker development images (WIP), both OK
> > > > - Run WordCount on local Flink and Spark runners OK
> > > >
> > > > Everything looks nice, only one minor thing (not blocking at all).
> The
> > > > proto generated files for python are not cleaned correctly and this
> > > > causes the validation to complain because the maven rat plugin does
> > > > not find the apache headers on the files  (this happens if you
> execute
> > > > mvn clean verify -Prelease immediately after the validation).
> > > >
> > >
> > > Ismaël, could you create a JIRA issue for this (to be fixed at a future
> > > release)?
> > >
> > >
> > > >
> > > > On Sun, Aug 13, 2017 at 6:52 AM, Jean-Baptiste Onofré <
> j...@nanthrax.net
> > >
> > > > wrote:
> > > > > +1 (binding)
> > > > >
> > > > > I do my own tests and casting my own vote ;)
> > > > >
> > > > > Regards
> > > > > JB
> > > > >
> > > > > On 08/09/2017 07:08 AM, Jean-Baptiste Onofré wrote:
> > > > >>
> > > > >> Hi everyone,
> > > > >>
> > > > >> Please review and vote on the release candidate #3 for the version
> > > > 2.1.0,
> > > > >> as follows:
> > > > >>
> > > > >> [ ] +1, Approve the release
> > > > >> [ ] -1, Do not approve the release (please provide specific
> > comments)
> > > > >>
> > > > >>
> > > > >> The complete staging area is available for your review, which
> > > includes:
> > > > >> * JIRA release notes [1],
> > > > >> * the official Apache source release to be deployed to
> > > dist.apache.org
> > > > >> [2], which is signed with the key with fingerprint C8282E76 [3],
> > > > >> * all artifacts to be deployed to the Maven Central Repository
> [4],
> > > > >> * source code tag "v2.1.0-RC3" [5],
> > > > >> * website pull request listing the release and publishing the API
> > > > >> reference manual [6].
> > > > >> * Python artifacts are deployed along with the source release to
> the
> > > > >> dist.apache.org [2].
> > > > >>
> > > > >> The vote will be open for at least 72 hours. It is adopted by
> > majority
> > > > >> approval, with at least 3 PMC affirmative votes.
> > > > >>
> > > > >> Thanks,
> > > > >> JB
> > > > >>
> > > > >> [1]
> > > > >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> > > > projectId=12319527&version=12340528
> > > > >> [2] https://dist.apache.org/repos/dist/dev/beam/2.1.0/
> > > > >> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> > > > >> [4] https://repository.apache.org/content/repositories/
> > > > orgapachebeam-1020/
> > > > >> [5] https://github.com/apache/beam/tree/v2.1.0-RC3
> > > > >> [6] https://github.com/apache/beam-site/pull/270
> > > > >
> > > > >
> > > > > --
> > > > > Jean-Baptiste Onofré
> > > > > jbono...@apache.org
> > > > > http://blog.nanthrax.net
> > > > > Talend - http://www.talend.com
> > > >
> > >
> >
>


Re: [VOTE] Release 2.1.0, release candidate #3

2017-08-16 Thread Eugene Kirpichov
Thanks Luke! With your vote, we have 3 PMC affirmative votes.
JB, what are the next steps to finalize the release?

On Wed, Aug 16, 2017 at 8:50 AM Lukasz Cwik 
wrote:

> Back from vacation.
>
> +1 binding
>
> BEAM-2671 has been marked for 2.2.0 release.
>
>
>
> On Wed, Aug 16, 2017 at 2:08 AM, Kobi Salant 
> wrote:
>
> > Hi,
> >
> > Spark runner was tested with word count example and a more complex
> session
> > based application on a yarn cluster.
> > Both application run successfully so we can say that spark runner passed
> > the sanity tests needed.
> >
> > Still there is an open ticket
> > https://issues.apache.org/jira/browse/BEAM-2671 which Stas is working on
> > and its implications should be taken into consideration regarding the
> > release.
> >
> > Regards
> > Kobi
> >
> > 2017-08-16 5:02 GMT+03:00 Eugene Kirpichov  >:
> >
> > > Hey all,
> > >
> > > Seems like we're missing one more affirmative vote from a PMC member
> (so
> > > far we have JB and Ahmet) to proceed with the release.
> > >
> > > On Mon, Aug 14, 2017 at 9:30 AM Ahmet Altay 
> > > wrote:
> > >
> > > > On Mon, Aug 14, 2017 at 6:32 AM, Ismaël Mejía 
> > wrote:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > - Validated signatures OK
> > > > > - mvn clean verify -Prelease on both OpenJDK 1.7 and Oracle JDK 8
> > with
> > > > > the docker development images (WIP), both OK
> > > > > - Run WordCount on local Flink and Spark runners OK
> > > > >
> > > > > Everything looks nice, only one minor thing (not blocking at all).
> > The
> > > > > proto generated files for python are not cleaned correctly and this
> > > > > causes the validation to complain because the maven rat plugin does
> > > > > not find the apache headers on the files  (this happens if you
> > execute
> > > > > mvn clean verify -Prelease immediately after the validation).
> > > > >
> > > >
> > > > Ismaël, could you create a JIRA issue for this (to be fixed at a
> future
> > > > release)?
> > > >
> > > >
> > > > >
> > > > > On Sun, Aug 13, 2017 at 6:52 AM, Jean-Baptiste Onofré <
> > j...@nanthrax.net
> > > >
> > > > > wrote:
> > > > > > +1 (binding)
> > > > > >
> > > > > > I do my own tests and casting my own vote ;)
> > > > > >
> > > > > > Regards
> > > > > > JB
> > > > > >
> > > > > > On 08/09/2017 07:08 AM, Jean-Baptiste Onofré wrote:
> > > > > >>
> > > > > >> Hi everyone,
> > > > > >>
> > > > > >> Please review and vote on the release candidate #3 for the
> version
> > > > > 2.1.0,
> > > > > >> as follows:
> > > > > >>
> > > > > >> [ ] +1, Approve the release
> > > > > >> [ ] -1, Do not approve the release (please provide specific
> > > comments)
> > > > > >>
> > > > > >>
> > > > > >> The complete staging area is available for your review, which
> > > > includes:
> > > > > >> * JIRA release notes [1],
> > > > > >> * the official Apache source release to be deployed to
> > > > dist.apache.org
> > > > > >> [2], which is signed with the key with fingerprint C8282E76 [3],
> > > > > >> * all artifacts to be deployed to the Maven Central Repository
> > [4],
> > > > > >> * source code tag "v2.1.0-RC3" [5],
> > > > > >> * website pull request listing the release and publishing the
> API
> > > > > >> reference manual [6].
> > > > > >> * Python artifacts are deployed along with the source release to
> > the
> > > > > >> dist.apache.org [2].
> > > > > >>
> > > > > >> The vote will be open for at least 72 hours. It is adopted by
> > > majority
> > > > > >> approval, with at least 3 PMC affirmative votes.
> > > > > >>
> > > > > >> Thanks,
> > > > > >> JB
> > > > > >>
> > > > > >> [1]
> > > > > >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> > > > > projectId=12319527&version=12340528
> > > > > >> [2] https://dist.apache.org/repos/dist/dev/beam/2.1.0/
> > > > > >> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> > > > > >> [4] https://repository.apache.org/content/repositories/
> > > > > orgapachebeam-1020/
> > > > > >> [5] https://github.com/apache/beam/tree/v2.1.0-RC3
> > > > > >> [6] https://github.com/apache/beam-site/pull/270
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Jean-Baptiste Onofré
> > > > > > jbono...@apache.org
> > > > > > http://blog.nanthrax.net
> > > > > > Talend - http://www.talend.com
> > > > >
> > > >
> > >
> >
>


Re: ConcurrentModificationException while performing checkpoint for Kinesis stream

2017-08-16 Thread Lukasz Cwik
Moved to dev@beam.apache.org

On Wed, Aug 16, 2017 at 9:22 AM, Pawel Bartoszek  wrote:

> When flink performs a checkpoint I get randomly
> ConcurrentModificationException.
>
> From my investigation it looks like the method
>
> public boolean advance() throws IOException
>
> 
> from
>
> https://github.com/apache/beam/blob/release-2.0.0/sdks/
> java/io/kinesis/src/main/java/org/apache/beam/sdk/io/
> kinesis/KinesisReader.java
>
> is called in another thread while checkpoint is being performed.
>
> The exception is caused because the method
>
> public UnboundedSource.CheckpointMark getCheckpointMark()
>
> from the KinesisReader.java 
> 
>  is iterating over iterator returned by  
> RoundRobin.iterator()  while a
>
> public boolean advance() throws IOException is calling RoundRobin<
> ShardRecordsIterator>.moveForward() from another thread which is causing
> java.util.ConcurrentModificationException to be thrown.
>
>
> RoundRobin class is using java.util.Deque queue
> which doesn't allow adding/removal of element while it's being iterated.
>
> Is some locking missing?
>
> I am using Beam 2.0.0, Flink 1.2.1, 20 slots and 32 kinesis shards.
>
> I created a bug for it as well
> https://issues.apache.org/jira/browse/BEAM-2752
>
> Stacktrace:
>
> java.lang.Exception: Error while triggering checkpoint 59 for Source: 
> Read(KinesisSource) -> Flat Map -> ParMultiDo(KinesisExtractor) -> Flat Map 
> -> ParMultiDo(StringToRecord) -> Flat Map -> ParMultiDo(Anonymous) -> Flat 
> Map -> ParMultiDo(ToRRecord) -> Flat Map -> ParMultiDo(AddTimestamps) -> Flat 
> Map -> ..GroupByOneMinuteWindow GROUP RDOTRECORDS BY ONE MINUTE 
> WINDOWS/Window.Assign.out -> (ParMultiDo(Anonymous) -> Flat Map -> 
> ParMultiDo(ToSomeKey) -> Flat Map -> ToKeyedWorkItem, 
> ParMultiDo(ToCompositeKey) -> Flat Map -> ParMultiDo(Anonymous) -> Flat Map 
> -> ToKeyedWorkItem, ParMultiDo(Anonymous) -> Flat Map -> 
> ParMultiDo(ApplyShardingKey) -> Flat Map -> ToKeyedWorkItem) (1/20)
>   at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1136)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.Exception: Could not perform checkpoint 59 for operator 
> Source: Read(KinesisSource) -> Flat Map -> ParMultiDo(KinesisExtractor) -> 
> Flat Map -> ParMultiDo(StringToRecord) -> Flat Map -> ParMultiDo(Anonymous) 
> -> Flat Map -> ParMultiDo(ToRRecord) -> Flat Map -> ParMultiDo(AddTimestamps) 
> -> Flat Map -> ..GroupByOneMinuteWindow GROUP RDOTRECORDS BY ONE 
> MINUTE WINDOWS/Window.Assign.out -> (ParMultiDo(Anonymous) -> Flat Map -> 
> ParMultiDo(ToSomeKey) -> Flat Map -> ToKeyedWorkItem, 
> ParMultiDo(ToCompositeKey) -> Flat Map -> ParMultiDo(Anonymous) -> Flat Map 
> -> ToKeyedWorkItem, ParMultiDo(Anonymous) -> Flat Map -> 
> ParMultiDo(ApplyShardingKey) -> Flat Map -> ToKeyedWorkItem) (1/20).
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:524)
>   at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1125)
>   ... 5 more
> Caused by: java.lang.Exception: Could not complete snapshot 59 for operator 
> Source: Read(KinesisSource) -> Flat Map -> ParMultiDo(KinesisExtractor) -> 
> Flat Map -> ParMultiDo(StringToRecord) -> Flat Map -> ParMultiDo(Anonymous) 
> -> Flat Map -> ParMultiDo(ToRRecord) -> Flat Map -> ParMultiDo(AddTimestamps) 
> -> Flat Map -> ..GroupByOneMinuteWindow GROUP RDOTRECORDS BY ONE 
> MINUTE WINDOWS/Window.Assign.out -> (ParMultiDo(Anonymous) -> Flat Map -> 
> ParMultiDo(ToSomeKey) -> Flat Map -> ToKeyedWorkItem, 
> ParMultiDo(ToCompositeKey) -> Flat Map -> ParMultiDo(Anonymous) -> Flat Map 
> -> ToKeyedWorkItem, ParMultiDo(Anonymous) -> Flat Map -> 
> ParMultiDo(ApplyShardingKey) -> Flat Map -> ToKeyedWorkItem) (1/20).
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:379)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(StreamTask.java:1157)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1090)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:630)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:575)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:518)
>   ... 6 m

Re: [VOTE] Release 2.1.0, release candidate #3

2017-08-16 Thread Jean-Baptiste Onofré
Hi

Thanks. I will send the result e-mail, promote the artifacts on Central and 
dist.apache.org. Then I will prepare the announcement (website and mailing 
lists).

Regards
JB

On Aug 16, 2017, 17:20, at 17:20, Eugene Kirpichov 
 wrote:
>Thanks Luke! With your vote, we have 3 PMC affirmative votes.
>JB, what are the next steps to finalize the release?
>
>On Wed, Aug 16, 2017 at 8:50 AM Lukasz Cwik 
>wrote:
>
>> Back from vacation.
>>
>> +1 binding
>>
>> BEAM-2671 has been marked for 2.2.0 release.
>>
>>
>>
>> On Wed, Aug 16, 2017 at 2:08 AM, Kobi Salant 
>> wrote:
>>
>> > Hi,
>> >
>> > Spark runner was tested with word count example and a more complex
>> session
>> > based application on a yarn cluster.
>> > Both application run successfully so we can say that spark runner
>passed
>> > the sanity tests needed.
>> >
>> > Still there is an open ticket
>> > https://issues.apache.org/jira/browse/BEAM-2671 which Stas is
>working on
>> > and its implications should be taken into consideration regarding
>the
>> > release.
>> >
>> > Regards
>> > Kobi
>> >
>> > 2017-08-16 5:02 GMT+03:00 Eugene Kirpichov
>> >:
>> >
>> > > Hey all,
>> > >
>> > > Seems like we're missing one more affirmative vote from a PMC
>member
>> (so
>> > > far we have JB and Ahmet) to proceed with the release.
>> > >
>> > > On Mon, Aug 14, 2017 at 9:30 AM Ahmet Altay
>
>> > > wrote:
>> > >
>> > > > On Mon, Aug 14, 2017 at 6:32 AM, Ismaël Mejía
>
>> > wrote:
>> > > >
>> > > > > +1 (non-binding)
>> > > > >
>> > > > > - Validated signatures OK
>> > > > > - mvn clean verify -Prelease on both OpenJDK 1.7 and Oracle
>JDK 8
>> > with
>> > > > > the docker development images (WIP), both OK
>> > > > > - Run WordCount on local Flink and Spark runners OK
>> > > > >
>> > > > > Everything looks nice, only one minor thing (not blocking at
>all).
>> > The
>> > > > > proto generated files for python are not cleaned correctly
>and this
>> > > > > causes the validation to complain because the maven rat
>plugin does
>> > > > > not find the apache headers on the files  (this happens if
>you
>> > execute
>> > > > > mvn clean verify -Prelease immediately after the validation).
>> > > > >
>> > > >
>> > > > Ismaël, could you create a JIRA issue for this (to be fixed at
>a
>> future
>> > > > release)?
>> > > >
>> > > >
>> > > > >
>> > > > > On Sun, Aug 13, 2017 at 6:52 AM, Jean-Baptiste Onofré <
>> > j...@nanthrax.net
>> > > >
>> > > > > wrote:
>> > > > > > +1 (binding)
>> > > > > >
>> > > > > > I do my own tests and casting my own vote ;)
>> > > > > >
>> > > > > > Regards
>> > > > > > JB
>> > > > > >
>> > > > > > On 08/09/2017 07:08 AM, Jean-Baptiste Onofré wrote:
>> > > > > >>
>> > > > > >> Hi everyone,
>> > > > > >>
>> > > > > >> Please review and vote on the release candidate #3 for the
>> version
>> > > > > 2.1.0,
>> > > > > >> as follows:
>> > > > > >>
>> > > > > >> [ ] +1, Approve the release
>> > > > > >> [ ] -1, Do not approve the release (please provide
>specific
>> > > comments)
>> > > > > >>
>> > > > > >>
>> > > > > >> The complete staging area is available for your review,
>which
>> > > > includes:
>> > > > > >> * JIRA release notes [1],
>> > > > > >> * the official Apache source release to be deployed to
>> > > > dist.apache.org
>> > > > > >> [2], which is signed with the key with fingerprint
>C8282E76 [3],
>> > > > > >> * all artifacts to be deployed to the Maven Central
>Repository
>> > [4],
>> > > > > >> * source code tag "v2.1.0-RC3" [5],
>> > > > > >> * website pull request listing the release and publishing
>the
>> API
>> > > > > >> reference manual [6].
>> > > > > >> * Python artifacts are deployed along with the source
>release to
>> > the
>> > > > > >> dist.apache.org [2].
>> > > > > >>
>> > > > > >> The vote will be open for at least 72 hours. It is adopted
>by
>> > > majority
>> > > > > >> approval, with at least 3 PMC affirmative votes.
>> > > > > >>
>> > > > > >> Thanks,
>> > > > > >> JB
>> > > > > >>
>> > > > > >> [1]
>> > > > > >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
>> > > > > projectId=12319527&version=12340528
>> > > > > >> [2] https://dist.apache.org/repos/dist/dev/beam/2.1.0/
>> > > > > >> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> > > > > >> [4] https://repository.apache.org/content/repositories/
>> > > > > orgapachebeam-1020/
>> > > > > >> [5] https://github.com/apache/beam/tree/v2.1.0-RC3
>> > > > > >> [6] https://github.com/apache/beam-site/pull/270
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > Jean-Baptiste Onofré
>> > > > > > jbono...@apache.org
>> > > > > > http://blog.nanthrax.net
>> > > > > > Talend - http://www.talend.com
>> > > > >
>> > > >
>> > >
>> >
>>


Re: Proposal : An extension for sketch-based statistics

2017-08-16 Thread Arnaud Fournier
Thanks to bring these subjects in the discussio Ismaël.

For the second point about the standard deviation, I just want to add that
this could also be added to the distribution metric.
Actually I think this makes much more sense than just add a new transform
for this (we can also do both).

Indeed, we just need to keep track of the sum of squared elements in the
DistributionData.
Then the standard deviation can be simply computed inside a method like for
the mean in the DistributionResult.

I could take care of this.

What do you think about this?

2017-08-14 15:15 GMT+02:00 Ismaël Mejía :

> Kenneth’s idea of using sketches for state with the State API is
> really interesting, it really opens some interesting use cases, I
> haven’t really thought about it but I believe it is really an
> appealing use case for the sketches. Note that the origin of this work
> was in the line of statistics, in particular we were interested in
> data sketches (specially the Cardinality ones) as a ‘lightweight’ way
> to have approximate metrics.
>
> There are two pending subjects to discuss:
>
> 1. Having sketches as approximate metrics seems interesting, however
> the current Beam Metrics API does not allow User-Defined Metrics. I
> don’t really know the details of the current metrics implementation.
> It is eventually possibly to support this? I mean to extend metrics to
> reuse something like the sketches extension?
>
> 2. There is also another contribution that Arnaud did in case there is
> interest, it is just a transform for standard deviation. We decided
> not to include it as part of the sketches extension since it was not
> consistent with the approximate nature of the extension, but I think
> it could be another interesting contribution as a subsequent PR (if
> there is interest also on this).
>
> Regards,
> Ismaël
>
> On Sat, Aug 12, 2017 at 11:20 AM, Arnaud Fournier
>  wrote:
> > Hello Kenneth, thank you for your answer.
> >
> > I read your blog post about stateful processing and that is indeed a
> great
> > feature !
> >
> > So if I understood correctly we could use the combineFns to declare
> > combiningStates so it can be used while processing elements in a DoFn.
> That
> > opens up a lot more use cases for the sketches !
> >
> > Actually this was already possible for 2 sketches but now I refined the
> > constructors of the 2 other sketches, and will do so for the other ones
> to
> > come.
> >
> >
> > Regards,
> >
> > Arnaud
> >
> > 2017-08-08 2:07 GMT+02:00 Kenneth Knowles :
> >
> >> This is a great development! I have wanted Beam to have a library of
> >> sketches.
> >>
> >> What Eugene is referring to is the fact that you can write
> >> Combine.perKey(combineFn) to use these in a transform but also
> >> StateSpecs.combiningState(combineFn) to use them in a stateful ParDo.
> So
> >> it
> >> is good to make the CombineFn public and refine their constructors to be
> >> user-friendly.
> >>
> >> Kenn
> >>
> >> On Fri, Aug 4, 2017 at 7:45 AM, Arnaud Fournier <
> >> arnaudfournier...@gmail.com
> >> > wrote:
> >>
> >> > Thanks for your comments, that is very encouraging !
> >> >
> >> > I have created a Jira : https://issues.apache.org/jira
> /browse/BEAM-2728
> >> > and a PR : https://github.com/apache/beam/pull/3686
> >> >
> >> > Eugene and Lucas I saw that you already have some ideas so I put you
> as
> >> > reviewers,
> >> > I look forward to hear more from you.
> >> >
> >> > With Ismael and JB, we already thought about using some of these
> >> indicators
> >> > as metric cells,
> >> > as it can be useful for some kinds of monitoring.
> >> > But I have never heard about state cells, is it something like the
> >> > QuantileState in ApproximateQuantiles ?
> >> >
> >> >
> >> >
> >> > 2017-08-04 3:14 GMT+02:00 Anand Iyer :
> >> >
> >> > > This is awesome!! Very exciting to see the addition of statistical
> and
> >> > > data-mining algorithms to Apache Beam.
> >> > >
> >> > > On Thu, Aug 3, 2017 at 2:32 PM, Eugene Kirpichov <
> >> > > kirpic...@google.com.invalid> wrote:
> >> > >
> >> > > > +1, Very exciting! I have some suggestions on the exact API to
> expose
> >> > > (e.g.
> >> > > > I think it makes sense to expose the CombineFn's directly, so that
> >> they
> >> > > can
> >> > > > also be used for combining state cells and not just as
> PTransforms),
> >> > but
> >> > > > that can be handled during regular code review.
> >> > > >
> >> > > > On Thu, Aug 3, 2017 at 2:23 PM Sourabh Bajaj
> >> > > >  wrote:
> >> > > >
> >> > > > > +1 to this.
> >> > > > >
> >> > > > > On Thu, Aug 3, 2017 at 6:28 AM Lukasz Cwik
> >>  >> > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > > I'm most interested in the frequency / cardinality tools as it
> >> > could
> >> > > be
> >> > > > > > used to help improve performance automatically for combiners
> by
> >> > > > detecting
> >> > > > > > the few keys case or automatically handle hot keys without
> >> needing
> >> > > > users
> >> > > > > to
> >> > > > > > specify the hints when they 

Proposal: adding a built-in I/O source for VCF files

2017-08-16 Thread Asha Rostamianfar
Hi everyone,

I have a proposal to add a new built-in I/O source for VCF files:
https://docs.google.com/document/d/1jsdxOPALYYlhnww2NLURS8NKXaFyRSJrcGbEDpY9Lkw/edit

I'm planning to take on the implementation work myself, but wanted to get
preliminary feedback about the proposed design as it requires making
changes to the existing TextIO. I will file a JIRA FR as well.

Please take a look at the doc and feel free to comment.

Thanks,
Asha


Re: Proposal: adding a built-in I/O source for VCF files

2017-08-16 Thread Jean-Baptiste Onofré
I will thanks !

Regards
JB

On Aug 16, 2017, 18:53, at 18:53, Asha Rostamianfar 
 wrote:
>Hi everyone,
>
>I have a proposal to add a new built-in I/O source for VCF files:
>https://docs.google.com/document/d/1jsdxOPALYYlhnww2NLURS8NKXaFyRSJrcGbEDpY9Lkw/edit
>
>I'm planning to take on the implementation work myself, but wanted to
>get
>preliminary feedback about the proposed design as it requires making
>changes to the existing TextIO. I will file a JIRA FR as well.
>
>Please take a look at the doc and feel free to comment.
>
>Thanks,
>Asha


Re: Proposal: adding a built-in I/O source for VCF files

2017-08-16 Thread Eugene Kirpichov
+Chamikara Jayalath 
Also you may find useful the recent discussion on WholeFileIO
https://lists.apache.org/thread.html/6ea193b7178f8ab44de5562bfdd94dc3fe740bc440e8a05e533e40cf@%3Cdev.beam.apache.org%3E
https://github.com/apache/beam/pull/3543 (I think bulk of discussion
happened there)
https://github.com/apache/beam/pull/3717


On Wed, Aug 16, 2017 at 10:58 AM Jean-Baptiste Onofré 
wrote:

> I will thanks !
>
> Regards
> JB
>
> On Aug 16, 2017, 18:53, at 18:53, Asha Rostamianfar
>  wrote:
> >Hi everyone,
> >
> >I have a proposal to add a new built-in I/O source for VCF files:
> >
> https://docs.google.com/document/d/1jsdxOPALYYlhnww2NLURS8NKXaFyRSJrcGbEDpY9Lkw/edit
> >
> >I'm planning to take on the implementation work myself, but wanted to
> >get
> >preliminary feedback about the proposed design as it requires making
> >changes to the existing TextIO. I will file a JIRA FR as well.
> >
> >Please take a look at the doc and feel free to comment.
> >
> >Thanks,
> >Asha
>


Re: [VOTE] Release 2.1.0, release candidate #3

2017-08-16 Thread Robert Bradshaw
+1 binding

(I've been on vacation as well.)

On Wed, Aug 16, 2017 at 8:50 AM, Lukasz Cwik  wrote:
> Back from vacation.
>
> +1 binding
>
> BEAM-2671 has been marked for 2.2.0 release.
>
>
>
> On Wed, Aug 16, 2017 at 2:08 AM, Kobi Salant  wrote:
>
>> Hi,
>>
>> Spark runner was tested with word count example and a more complex session
>> based application on a yarn cluster.
>> Both application run successfully so we can say that spark runner passed
>> the sanity tests needed.
>>
>> Still there is an open ticket
>> https://issues.apache.org/jira/browse/BEAM-2671 which Stas is working on
>> and its implications should be taken into consideration regarding the
>> release.
>>
>> Regards
>> Kobi
>>
>> 2017-08-16 5:02 GMT+03:00 Eugene Kirpichov :
>>
>> > Hey all,
>> >
>> > Seems like we're missing one more affirmative vote from a PMC member (so
>> > far we have JB and Ahmet) to proceed with the release.
>> >
>> > On Mon, Aug 14, 2017 at 9:30 AM Ahmet Altay 
>> > wrote:
>> >
>> > > On Mon, Aug 14, 2017 at 6:32 AM, Ismaël Mejía 
>> wrote:
>> > >
>> > > > +1 (non-binding)
>> > > >
>> > > > - Validated signatures OK
>> > > > - mvn clean verify -Prelease on both OpenJDK 1.7 and Oracle JDK 8
>> with
>> > > > the docker development images (WIP), both OK
>> > > > - Run WordCount on local Flink and Spark runners OK
>> > > >
>> > > > Everything looks nice, only one minor thing (not blocking at all).
>> The
>> > > > proto generated files for python are not cleaned correctly and this
>> > > > causes the validation to complain because the maven rat plugin does
>> > > > not find the apache headers on the files  (this happens if you
>> execute
>> > > > mvn clean verify -Prelease immediately after the validation).
>> > > >
>> > >
>> > > Ismaël, could you create a JIRA issue for this (to be fixed at a future
>> > > release)?
>> > >
>> > >
>> > > >
>> > > > On Sun, Aug 13, 2017 at 6:52 AM, Jean-Baptiste Onofré <
>> j...@nanthrax.net
>> > >
>> > > > wrote:
>> > > > > +1 (binding)
>> > > > >
>> > > > > I do my own tests and casting my own vote ;)
>> > > > >
>> > > > > Regards
>> > > > > JB
>> > > > >
>> > > > > On 08/09/2017 07:08 AM, Jean-Baptiste Onofré wrote:
>> > > > >>
>> > > > >> Hi everyone,
>> > > > >>
>> > > > >> Please review and vote on the release candidate #3 for the version
>> > > > 2.1.0,
>> > > > >> as follows:
>> > > > >>
>> > > > >> [ ] +1, Approve the release
>> > > > >> [ ] -1, Do not approve the release (please provide specific
>> > comments)
>> > > > >>
>> > > > >>
>> > > > >> The complete staging area is available for your review, which
>> > > includes:
>> > > > >> * JIRA release notes [1],
>> > > > >> * the official Apache source release to be deployed to
>> > > dist.apache.org
>> > > > >> [2], which is signed with the key with fingerprint C8282E76 [3],
>> > > > >> * all artifacts to be deployed to the Maven Central Repository
>> [4],
>> > > > >> * source code tag "v2.1.0-RC3" [5],
>> > > > >> * website pull request listing the release and publishing the API
>> > > > >> reference manual [6].
>> > > > >> * Python artifacts are deployed along with the source release to
>> the
>> > > > >> dist.apache.org [2].
>> > > > >>
>> > > > >> The vote will be open for at least 72 hours. It is adopted by
>> > majority
>> > > > >> approval, with at least 3 PMC affirmative votes.
>> > > > >>
>> > > > >> Thanks,
>> > > > >> JB
>> > > > >>
>> > > > >> [1]
>> > > > >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
>> > > > projectId=12319527&version=12340528
>> > > > >> [2] https://dist.apache.org/repos/dist/dev/beam/2.1.0/
>> > > > >> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> > > > >> [4] https://repository.apache.org/content/repositories/
>> > > > orgapachebeam-1020/
>> > > > >> [5] https://github.com/apache/beam/tree/v2.1.0-RC3
>> > > > >> [6] https://github.com/apache/beam-site/pull/270
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Jean-Baptiste Onofré
>> > > > > jbono...@apache.org
>> > > > > http://blog.nanthrax.net
>> > > > > Talend - http://www.talend.com
>> > > >
>> > >
>> >
>>


contrib package for beam?

2017-08-16 Thread Pablo Estrada
Hi all,
What would be an appropriate medium for contributions such as utility
Pipelines or PTransforms? Perhaps it's different for each kind of
contribution (sources/sinks, PTransforms, or utility pipelines).

The question comes from an active user on Stack Overflow[1], and it seems
pertinent. What's standard practice in other projects to keep this sort of
contributions shared and available? Perhaps keep a list with links in our
readme, or the beam site, or something else?

Best
-P.

1 -
https://stackoverflow.com/questions/45603814/any-contrib-package-for-apache-beam-where-i-can-commit-a-dataflow-pipeline


Re: contrib package for beam?

2017-08-16 Thread Jesse Anderson
I've had this discussion before. I'd love to see one so that there's a
consistent home for things that don't belong in the API.

On Wed, Aug 16, 2017, 2:55 PM Pablo Estrada 
wrote:

> Hi all,
> What would be an appropriate medium for contributions such as utility
> Pipelines or PTransforms? Perhaps it's different for each kind of
> contribution (sources/sinks, PTransforms, or utility pipelines).
>
> The question comes from an active user on Stack Overflow[1], and it seems
> pertinent. What's standard practice in other projects to keep this sort of
> contributions shared and available? Perhaps keep a list with links in our
> readme, or the beam site, or something else?
>
> Best
> -P.
>
> 1 -
>
> https://stackoverflow.com/questions/45603814/any-contrib-package-for-apache-beam-where-i-can-commit-a-dataflow-pipeline
>
-- 
Thanks,

Jesse


Re: Proposal: adding a built-in I/O source for VCF files

2017-08-16 Thread Chamikara Jayalath
Thanks for proposing this.

I left some comments. My main concern is the possible complexity this might
add to textio and potential performance impact. So at this point I prefer
if this is implemented as a new filebasedsource instead of updating textio.
I'm open to being convinced otherwise :).

Thanks,
Cham

On Wed, Aug 16, 2017 at 11:01 AM Eugene Kirpichov
 wrote:

> +Chamikara Jayalath 
> Also you may find useful the recent discussion on WholeFileIO
>
> https://lists.apache.org/thread.html/6ea193b7178f8ab44de5562bfdd94dc3fe740bc440e8a05e533e40cf@%3Cdev.beam.apache.org%3E
> https://github.com/apache/beam/pull/3543 (I think bulk of discussion
> happened there)
> https://github.com/apache/beam/pull/3717
>
>
> On Wed, Aug 16, 2017 at 10:58 AM Jean-Baptiste Onofré 
> wrote:
>
> > I will thanks !
> >
> > Regards
> > JB
> >
> > On Aug 16, 2017, 18:53, at 18:53, Asha Rostamianfar
> >  wrote:
> > >Hi everyone,
> > >
> > >I have a proposal to add a new built-in I/O source for VCF files:
> > >
> >
> https://docs.google.com/document/d/1jsdxOPALYYlhnww2NLURS8NKXaFyRSJrcGbEDpY9Lkw/edit
> > >
> > >I'm planning to take on the implementation work myself, but wanted to
> > >get
> > >preliminary feedback about the proposed design as it requires making
> > >changes to the existing TextIO. I will file a JIRA FR as well.
> > >
> > >Please take a look at the doc and feel free to comment.
> > >
> > >Thanks,
> > >Asha
> >
>


Re: Policy for stale PRs

2017-08-16 Thread Ahmet Altay
Sounds like we have consensus. Since this is a new policy, I would suggest
picking the most flexible option for now (90 days) and we can tighten it in
the future. To answer Kenn's question, I do not know, how other projects
handle this. I did a basic search but could not find a good answer.

What mechanism can we use to close PRs, assuming that author will be out of
communication. We can push a commit with a "This closes #xyz #abc" message.
Is there another way to do this?

Ahmet

On Wed, Aug 16, 2017 at 4:32 AM, Aviem Zur  wrote:

> Makes sense to close after a long time of inactivity and no response, and
> as Kenn mentioned they can always re-open.
>
> On Wed, Aug 16, 2017 at 12:20 AM Jean-Baptiste Onofré 
> wrote:
>
> > If we consider the author, it makes sense.
> >
> > Regards
> > JB
> >
> > On Aug 15, 2017, 01:29, at 01:29, Ted Yu  wrote:
> > >The proposal makes sense.
> > >
> > >If the author of PR doesn't respond for 90 days, the PR is likely out
> > >of
> > >sync with current repo.
> > >
> > >Cheers
> > >
> > >On Mon, Aug 14, 2017 at 5:27 PM, Ahmet Altay 
> > >wrote:
> > >
> > >> Hi all,
> > >>
> > >> Do we have an existing policy for handling stale PRs? If not could we
> > >come
> > >> up with one. We are getting close to 100 open PRs. Some of the open
> > >PRs
> > >> have not been touched for a while, and if we exclude the pings the
> > >number
> > >> will be higher.
> > >>
> > >> For example, we could close PRs that have not been updated by the
> > >original
> > >> author for 90 days even after multiple attempts to reach them (e.g.
> > >[1],
> > >> [2] are such PRs.)
> > >>
> > >> What do you think?
> > >>
> > >> Thank you,
> > >> Ahmet
> > >>
> > >> [1] https://github.com/apache/beam/pull/1464
> > >> [2] https://github.com/apache/beam/pull/2949
> > >>
> >
>


Re: Policy for stale PRs

2017-08-16 Thread Sourabh Bajaj
Some projects I have seen close stale PRs after 30 days, saying "Closing
due to lack of activity, please feel free to re-open".

On Wed, Aug 16, 2017 at 12:05 PM Ahmet Altay 
wrote:

> Sounds like we have consensus. Since this is a new policy, I would suggest
> picking the most flexible option for now (90 days) and we can tighten it in
> the future. To answer Kenn's question, I do not know, how other projects
> handle this. I did a basic search but could not find a good answer.
>
> What mechanism can we use to close PRs, assuming that author will be out of
> communication. We can push a commit with a "This closes #xyz #abc" message.
> Is there another way to do this?
>
> Ahmet
>
> On Wed, Aug 16, 2017 at 4:32 AM, Aviem Zur  wrote:
>
> > Makes sense to close after a long time of inactivity and no response, and
> > as Kenn mentioned they can always re-open.
> >
> > On Wed, Aug 16, 2017 at 12:20 AM Jean-Baptiste Onofré 
> > wrote:
> >
> > > If we consider the author, it makes sense.
> > >
> > > Regards
> > > JB
> > >
> > > On Aug 15, 2017, 01:29, at 01:29, Ted Yu  wrote:
> > > >The proposal makes sense.
> > > >
> > > >If the author of PR doesn't respond for 90 days, the PR is likely out
> > > >of
> > > >sync with current repo.
> > > >
> > > >Cheers
> > > >
> > > >On Mon, Aug 14, 2017 at 5:27 PM, Ahmet Altay  >
> > > >wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> Do we have an existing policy for handling stale PRs? If not could
> we
> > > >come
> > > >> up with one. We are getting close to 100 open PRs. Some of the open
> > > >PRs
> > > >> have not been touched for a while, and if we exclude the pings the
> > > >number
> > > >> will be higher.
> > > >>
> > > >> For example, we could close PRs that have not been updated by the
> > > >original
> > > >> author for 90 days even after multiple attempts to reach them (e.g.
> > > >[1],
> > > >> [2] are such PRs.)
> > > >>
> > > >> What do you think?
> > > >>
> > > >> Thank you,
> > > >> Ahmet
> > > >>
> > > >> [1] https://github.com/apache/beam/pull/1464
> > > >> [2] https://github.com/apache/beam/pull/2949
> > > >>
> > >
> >
>


Re: Policy for stale PRs

2017-08-16 Thread Ted Yu
What should be done to the JIRA associated with the PR?
 Original message From: Ahmet Altay  
Date: 8/16/17  12:05 PM  (GMT-08:00) To: dev@beam.apache.org Subject: Re: 
Policy for stale PRs 
Sounds like we have consensus. Since this is a new policy, I would suggest
picking the most flexible option for now (90 days) and we can tighten it in
the future. To answer Kenn's question, I do not know, how other projects
handle this. I did a basic search but could not find a good answer.

What mechanism can we use to close PRs, assuming that author will be out of
communication. We can push a commit with a "This closes #xyz #abc" message.
Is there another way to do this?

Ahmet

On Wed, Aug 16, 2017 at 4:32 AM, Aviem Zur  wrote:

> Makes sense to close after a long time of inactivity and no response, and
> as Kenn mentioned they can always re-open.
>
> On Wed, Aug 16, 2017 at 12:20 AM Jean-Baptiste Onofré 
> wrote:
>
> > If we consider the author, it makes sense.
> >
> > Regards
> > JB
> >
> > On Aug 15, 2017, 01:29, at 01:29, Ted Yu  wrote:
> > >The proposal makes sense.
> > >
> > >If the author of PR doesn't respond for 90 days, the PR is likely out
> > >of
> > >sync with current repo.
> > >
> > >Cheers
> > >
> > >On Mon, Aug 14, 2017 at 5:27 PM, Ahmet Altay 
> > >wrote:
> > >
> > >> Hi all,
> > >>
> > >> Do we have an existing policy for handling stale PRs? If not could we
> > >come
> > >> up with one. We are getting close to 100 open PRs. Some of the open
> > >PRs
> > >> have not been touched for a while, and if we exclude the pings the
> > >number
> > >> will be higher.
> > >>
> > >> For example, we could close PRs that have not been updated by the
> > >original
> > >> author for 90 days even after multiple attempts to reach them (e.g.
> > >[1],
> > >> [2] are such PRs.)
> > >>
> > >> What do you think?
> > >>
> > >> Thank you,
> > >> Ahmet
> > >>
> > >> [1] https://github.com/apache/beam/pull/1464
> > >> [2] https://github.com/apache/beam/pull/2949
> > >>
> >
>


Re: Policy for stale PRs

2017-08-16 Thread Lukasz Cwik
I think the JIRA should remain open and possibly become unassigned.

On Wed, Aug 16, 2017 at 12:16 PM, Ted Yu  wrote:

> What should be done to the JIRA associated with the PR?
>  Original message From: Ahmet Altay
>  Date: 8/16/17  12:05 PM  (GMT-08:00) To:
> dev@beam.apache.org Subject: Re: Policy for stale PRs
> Sounds like we have consensus. Since this is a new policy, I would suggest
> picking the most flexible option for now (90 days) and we can tighten it in
> the future. To answer Kenn's question, I do not know, how other projects
> handle this. I did a basic search but could not find a good answer.
>
> What mechanism can we use to close PRs, assuming that author will be out of
> communication. We can push a commit with a "This closes #xyz #abc" message.
> Is there another way to do this?
>
> Ahmet
>
> On Wed, Aug 16, 2017 at 4:32 AM, Aviem Zur  wrote:
>
> > Makes sense to close after a long time of inactivity and no response, and
> > as Kenn mentioned they can always re-open.
> >
> > On Wed, Aug 16, 2017 at 12:20 AM Jean-Baptiste Onofré 
> > wrote:
> >
> > > If we consider the author, it makes sense.
> > >
> > > Regards
> > > JB
> > >
> > > On Aug 15, 2017, 01:29, at 01:29, Ted Yu  wrote:
> > > >The proposal makes sense.
> > > >
> > > >If the author of PR doesn't respond for 90 days, the PR is likely out
> > > >of
> > > >sync with current repo.
> > > >
> > > >Cheers
> > > >
> > > >On Mon, Aug 14, 2017 at 5:27 PM, Ahmet Altay  >
> > > >wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> Do we have an existing policy for handling stale PRs? If not could
> we
> > > >come
> > > >> up with one. We are getting close to 100 open PRs. Some of the open
> > > >PRs
> > > >> have not been touched for a while, and if we exclude the pings the
> > > >number
> > > >> will be higher.
> > > >>
> > > >> For example, we could close PRs that have not been updated by the
> > > >original
> > > >> author for 90 days even after multiple attempts to reach them (e.g.
> > > >[1],
> > > >> [2] are such PRs.)
> > > >>
> > > >> What do you think?
> > > >>
> > > >> Thank you,
> > > >> Ahmet
> > > >>
> > > >> [1] https://github.com/apache/beam/pull/1464
> > > >> [2] https://github.com/apache/beam/pull/2949
> > > >>
> > >
> >
>


Re: contrib package for beam?

2017-08-16 Thread Griselda Cuevas
I like the idea -- This seems like a thing I can help with to get familiar
with the project. Who could help me make a list of available things?

On 16 August 2017 at 12:02, Jesse Anderson 
wrote:

> I've had this discussion before. I'd love to see one so that there's a
> consistent home for things that don't belong in the API.
>
> On Wed, Aug 16, 2017, 2:55 PM Pablo Estrada 
> wrote:
>
> > Hi all,
> > What would be an appropriate medium for contributions such as utility
> > Pipelines or PTransforms? Perhaps it's different for each kind of
> > contribution (sources/sinks, PTransforms, or utility pipelines).
> >
> > The question comes from an active user on Stack Overflow[1], and it seems
> > pertinent. What's standard practice in other projects to keep this sort
> of
> > contributions shared and available? Perhaps keep a list with links in our
> > readme, or the beam site, or something else?
> >
> > Best
> > -P.
> >
> > 1 -
> >
> > https://stackoverflow.com/questions/45603814/any-
> contrib-package-for-apache-beam-where-i-can-commit-a-dataflow-pipeline
> >
> --
> Thanks,
>
> Jesse
>


Re: Policy for stale PRs

2017-08-16 Thread Jean-Baptiste Onofré
IMHO the jira should stay open as it's different from the PR.

Regards
JB

On Aug 16, 2017, 20:16, at 20:16, Ted Yu  wrote:
>What should be done to the JIRA associated with the PR?
> Original message From: Ahmet Altay
> Date: 8/16/17  12:05 PM  (GMT-08:00) To:
>dev@beam.apache.org Subject: Re: Policy for stale PRs
>Sounds like we have consensus. Since this is a new policy, I would
>suggest
>picking the most flexible option for now (90 days) and we can tighten
>it in
>the future. To answer Kenn's question, I do not know, how other
>projects
>handle this. I did a basic search but could not find a good answer.
>
>What mechanism can we use to close PRs, assuming that author will be
>out of
>communication. We can push a commit with a "This closes #xyz #abc"
>message.
>Is there another way to do this?
>
>Ahmet
>
>On Wed, Aug 16, 2017 at 4:32 AM, Aviem Zur  wrote:
>
>> Makes sense to close after a long time of inactivity and no response,
>and
>> as Kenn mentioned they can always re-open.
>>
>> On Wed, Aug 16, 2017 at 12:20 AM Jean-Baptiste Onofré
>
>> wrote:
>>
>> > If we consider the author, it makes sense.
>> >
>> > Regards
>> > JB
>> >
>> > On Aug 15, 2017, 01:29, at 01:29, Ted Yu 
>wrote:
>> > >The proposal makes sense.
>> > >
>> > >If the author of PR doesn't respond for 90 days, the PR is likely
>out
>> > >of
>> > >sync with current repo.
>> > >
>> > >Cheers
>> > >
>> > >On Mon, Aug 14, 2017 at 5:27 PM, Ahmet Altay
>
>> > >wrote:
>> > >
>> > >> Hi all,
>> > >>
>> > >> Do we have an existing policy for handling stale PRs? If not
>could we
>> > >come
>> > >> up with one. We are getting close to 100 open PRs. Some of the
>open
>> > >PRs
>> > >> have not been touched for a while, and if we exclude the pings
>the
>> > >number
>> > >> will be higher.
>> > >>
>> > >> For example, we could close PRs that have not been updated by
>the
>> > >original
>> > >> author for 90 days even after multiple attempts to reach them
>(e.g.
>> > >[1],
>> > >> [2] are such PRs.)
>> > >>
>> > >> What do you think?
>> > >>
>> > >> Thank you,
>> > >> Ahmet
>> > >>
>> > >> [1] https://github.com/apache/beam/pull/1464
>> > >> [2] https://github.com/apache/beam/pull/2949
>> > >>
>> >
>>


Re: Policy for stale PRs

2017-08-16 Thread Thomas Groh
JIRAs should only be closed if the issue that they track is no longer
relevant (either via being fixed or being determined to not be a problem).
If a JIRA isn't being meaningfully worked on, it should be unassigned (in
all cases, not just if there's an associated pull request that has not been
worked on).

+1 on closing PRs with no action from the original author after some
reasonable time frame (90 days is certainly reasonable; 30 might be too
short) if the author has not responded to actionable feedback.

On Wed, Aug 16, 2017 at 12:07 PM, Sourabh Bajaj <
sourabhba...@google.com.invalid> wrote:

> Some projects I have seen close stale PRs after 30 days, saying "Closing
> due to lack of activity, please feel free to re-open".
>
> On Wed, Aug 16, 2017 at 12:05 PM Ahmet Altay 
> wrote:
>
> > Sounds like we have consensus. Since this is a new policy, I would
> suggest
> > picking the most flexible option for now (90 days) and we can tighten it
> in
> > the future. To answer Kenn's question, I do not know, how other projects
> > handle this. I did a basic search but could not find a good answer.
> >
> > What mechanism can we use to close PRs, assuming that author will be out
> of
> > communication. We can push a commit with a "This closes #xyz #abc"
> message.
> > Is there another way to do this?
> >
> > Ahmet
> >
> > On Wed, Aug 16, 2017 at 4:32 AM, Aviem Zur  wrote:
> >
> > > Makes sense to close after a long time of inactivity and no response,
> and
> > > as Kenn mentioned they can always re-open.
> > >
> > > On Wed, Aug 16, 2017 at 12:20 AM Jean-Baptiste Onofré  >
> > > wrote:
> > >
> > > > If we consider the author, it makes sense.
> > > >
> > > > Regards
> > > > JB
> > > >
> > > > On Aug 15, 2017, 01:29, at 01:29, Ted Yu 
> wrote:
> > > > >The proposal makes sense.
> > > > >
> > > > >If the author of PR doesn't respond for 90 days, the PR is likely
> out
> > > > >of
> > > > >sync with current repo.
> > > > >
> > > > >Cheers
> > > > >
> > > > >On Mon, Aug 14, 2017 at 5:27 PM, Ahmet Altay
>  > >
> > > > >wrote:
> > > > >
> > > > >> Hi all,
> > > > >>
> > > > >> Do we have an existing policy for handling stale PRs? If not could
> > we
> > > > >come
> > > > >> up with one. We are getting close to 100 open PRs. Some of the
> open
> > > > >PRs
> > > > >> have not been touched for a while, and if we exclude the pings the
> > > > >number
> > > > >> will be higher.
> > > > >>
> > > > >> For example, we could close PRs that have not been updated by the
> > > > >original
> > > > >> author for 90 days even after multiple attempts to reach them
> (e.g.
> > > > >[1],
> > > > >> [2] are such PRs.)
> > > > >>
> > > > >> What do you think?
> > > > >>
> > > > >> Thank you,
> > > > >> Ahmet
> > > > >>
> > > > >> [1] https://github.com/apache/beam/pull/1464
> > > > >> [2] https://github.com/apache/beam/pull/2949
> > > > >>
> > > >
> > >
> >
>


Re: Policy for stale PRs

2017-08-16 Thread Ismaël Mejía
Thanks Ahmet for bringing this subject.

+1 to close the stale PRs automatically after a fixed time of inactivity.  90
days is ok, but maybe a shorter period is better. If we consider that being
stale is just not having any activity i.e., the author of the PR does not answer
any message. The author can buy extra time just by adding a message to say,
'wait I am still working on this', and win a complete period of time, so the
longer the staleness period is the longer it can eventually be extended.

I agree with Thomas the JIRAs should still stay open but should become
unassigned because the issue won't be yet fixed but we want to encourage people
to work on it.

Other additional subject that makes sense to discuss here is if we need policies
to avoid 'stale' JIRAs (JIRAs that have been taken but that don't have
progress)?, for example:

- Prevent contributors/committers from taking more than 'n' JIRAs at the same
  time (we should define this n considering the period of staleness, maybe 10?).

- Automatically free 'stale' JIRAs after a fixed time period with no active work

Remember the objective is to encourage more people to contribute but people
won't be encouraged to contribute on subjects that other people have taken, this
is a well known anti-pattern in volunteer communities, see
http://communitymgt.wikia.com/wiki/Cookie_Licking

On Wed, Aug 16, 2017 at 10:38 PM, Thomas Groh  wrote:
> JIRAs should only be closed if the issue that they track is no longer
> relevant (either via being fixed or being determined to not be a problem).
> If a JIRA isn't being meaningfully worked on, it should be unassigned (in
> all cases, not just if there's an associated pull request that has not been
> worked on).
>
> +1 on closing PRs with no action from the original author after some
> reasonable time frame (90 days is certainly reasonable; 30 might be too
> short) if the author has not responded to actionable feedback.
>
> On Wed, Aug 16, 2017 at 12:07 PM, Sourabh Bajaj <
> sourabhba...@google.com.invalid> wrote:
>
>> Some projects I have seen close stale PRs after 30 days, saying "Closing
>> due to lack of activity, please feel free to re-open".
>>
>> On Wed, Aug 16, 2017 at 12:05 PM Ahmet Altay 
>> wrote:
>>
>> > Sounds like we have consensus. Since this is a new policy, I would
>> suggest
>> > picking the most flexible option for now (90 days) and we can tighten it
>> in
>> > the future. To answer Kenn's question, I do not know, how other projects
>> > handle this. I did a basic search but could not find a good answer.
>> >
>> > What mechanism can we use to close PRs, assuming that author will be out
>> of
>> > communication. We can push a commit with a "This closes #xyz #abc"
>> message.
>> > Is there another way to do this?
>> >
>> > Ahmet
>> >
>> > On Wed, Aug 16, 2017 at 4:32 AM, Aviem Zur  wrote:
>> >
>> > > Makes sense to close after a long time of inactivity and no response,
>> and
>> > > as Kenn mentioned they can always re-open.
>> > >
>> > > On Wed, Aug 16, 2017 at 12:20 AM Jean-Baptiste Onofré > >
>> > > wrote:
>> > >
>> > > > If we consider the author, it makes sense.
>> > > >
>> > > > Regards
>> > > > JB
>> > > >
>> > > > On Aug 15, 2017, 01:29, at 01:29, Ted Yu 
>> wrote:
>> > > > >The proposal makes sense.
>> > > > >
>> > > > >If the author of PR doesn't respond for 90 days, the PR is likely
>> out
>> > > > >of
>> > > > >sync with current repo.
>> > > > >
>> > > > >Cheers
>> > > > >
>> > > > >On Mon, Aug 14, 2017 at 5:27 PM, Ahmet Altay
>> > > >
>> > > > >wrote:
>> > > > >
>> > > > >> Hi all,
>> > > > >>
>> > > > >> Do we have an existing policy for handling stale PRs? If not could
>> > we
>> > > > >come
>> > > > >> up with one. We are getting close to 100 open PRs. Some of the
>> open
>> > > > >PRs
>> > > > >> have not been touched for a while, and if we exclude the pings the
>> > > > >number
>> > > > >> will be higher.
>> > > > >>
>> > > > >> For example, we could close PRs that have not been updated by the
>> > > > >original
>> > > > >> author for 90 days even after multiple attempts to reach them
>> (e.g.
>> > > > >[1],
>> > > > >> [2] are such PRs.)
>> > > > >>
>> > > > >> What do you think?
>> > > > >>
>> > > > >> Thank you,
>> > > > >> Ahmet
>> > > > >>
>> > > > >> [1] https://github.com/apache/beam/pull/1464
>> > > > >> [2] https://github.com/apache/beam/pull/2949
>> > > > >>
>> > > >
>> > >
>> >
>>


Re: Policy for stale PRs

2017-08-16 Thread Ted Yu
bq. IRAs should still stay open but should become unassigned

The above would need admin privilege, right ?
Is there automated way to do it ?

bq. Prevent contributors/committers from taking more than 'n' JIRAs at the
same time

It would be hard to determine the N above since the amount of coding /
testing varies greatly across JIRAs.



On Wed, Aug 16, 2017 at 3:20 PM, Ismaël Mejía  wrote:

> Thanks Ahmet for bringing this subject.
>
> +1 to close the stale PRs automatically after a fixed time of inactivity.
> 90
> days is ok, but maybe a shorter period is better. If we consider that being
> stale is just not having any activity i.e., the author of the PR does not
> answer
> any message. The author can buy extra time just by adding a message to say,
> 'wait I am still working on this', and win a complete period of time, so
> the
> longer the staleness period is the longer it can eventually be extended.
>
> I agree with Thomas the JIRAs should still stay open but should become
> unassigned because the issue won't be yet fixed but we want to encourage
> people
> to work on it.
>
> Other additional subject that makes sense to discuss here is if we need
> policies
> to avoid 'stale' JIRAs (JIRAs that have been taken but that don't have
> progress)?, for example:
>
> - Prevent contributors/committers from taking more than 'n' JIRAs at the
> same
>   time (we should define this n considering the period of staleness, maybe
> 10?).
>
> - Automatically free 'stale' JIRAs after a fixed time period with no
> active work
>
> Remember the objective is to encourage more people to contribute but people
> won't be encouraged to contribute on subjects that other people have
> taken, this
> is a well known anti-pattern in volunteer communities, see
> http://communitymgt.wikia.com/wiki/Cookie_Licking
>
> On Wed, Aug 16, 2017 at 10:38 PM, Thomas Groh 
> wrote:
> > JIRAs should only be closed if the issue that they track is no longer
> > relevant (either via being fixed or being determined to not be a
> problem).
> > If a JIRA isn't being meaningfully worked on, it should be unassigned (in
> > all cases, not just if there's an associated pull request that has not
> been
> > worked on).
> >
> > +1 on closing PRs with no action from the original author after some
> > reasonable time frame (90 days is certainly reasonable; 30 might be too
> > short) if the author has not responded to actionable feedback.
> >
> > On Wed, Aug 16, 2017 at 12:07 PM, Sourabh Bajaj <
> > sourabhba...@google.com.invalid> wrote:
> >
> >> Some projects I have seen close stale PRs after 30 days, saying "Closing
> >> due to lack of activity, please feel free to re-open".
> >>
> >> On Wed, Aug 16, 2017 at 12:05 PM Ahmet Altay 
> >> wrote:
> >>
> >> > Sounds like we have consensus. Since this is a new policy, I would
> >> suggest
> >> > picking the most flexible option for now (90 days) and we can tighten
> it
> >> in
> >> > the future. To answer Kenn's question, I do not know, how other
> projects
> >> > handle this. I did a basic search but could not find a good answer.
> >> >
> >> > What mechanism can we use to close PRs, assuming that author will be
> out
> >> of
> >> > communication. We can push a commit with a "This closes #xyz #abc"
> >> message.
> >> > Is there another way to do this?
> >> >
> >> > Ahmet
> >> >
> >> > On Wed, Aug 16, 2017 at 4:32 AM, Aviem Zur 
> wrote:
> >> >
> >> > > Makes sense to close after a long time of inactivity and no
> response,
> >> and
> >> > > as Kenn mentioned they can always re-open.
> >> > >
> >> > > On Wed, Aug 16, 2017 at 12:20 AM Jean-Baptiste Onofré <
> j...@nanthrax.net
> >> >
> >> > > wrote:
> >> > >
> >> > > > If we consider the author, it makes sense.
> >> > > >
> >> > > > Regards
> >> > > > JB
> >> > > >
> >> > > > On Aug 15, 2017, 01:29, at 01:29, Ted Yu 
> >> wrote:
> >> > > > >The proposal makes sense.
> >> > > > >
> >> > > > >If the author of PR doesn't respond for 90 days, the PR is likely
> >> out
> >> > > > >of
> >> > > > >sync with current repo.
> >> > > > >
> >> > > > >Cheers
> >> > > > >
> >> > > > >On Mon, Aug 14, 2017 at 5:27 PM, Ahmet Altay
> >>  >> > >
> >> > > > >wrote:
> >> > > > >
> >> > > > >> Hi all,
> >> > > > >>
> >> > > > >> Do we have an existing policy for handling stale PRs? If not
> could
> >> > we
> >> > > > >come
> >> > > > >> up with one. We are getting close to 100 open PRs. Some of the
> >> open
> >> > > > >PRs
> >> > > > >> have not been touched for a while, and if we exclude the pings
> the
> >> > > > >number
> >> > > > >> will be higher.
> >> > > > >>
> >> > > > >> For example, we could close PRs that have not been updated by
> the
> >> > > > >original
> >> > > > >> author for 90 days even after multiple attempts to reach them
> >> (e.g.
> >> > > > >[1],
> >> > > > >> [2] are such PRs.)
> >> > > > >>
> >> > > > >> What do you think?
> >> > > > >>
> >> > > > >> Thank you,
> >> > > > >> Ahmet
> >> > > > >>
> >> > > > >> [1] https://github.com/apache/beam/pull/1464
> >> > > > >

Re: Hello from a newbie to the data world living in the city by the bay!

2017-08-16 Thread María García Herrero
Welcome Gris, Umang, and Justin!

On Wed, Aug 16, 2017 at 1:15 AM, Jean-Baptiste Onofré 
wrote:

> Welcome !
>
> Regards
> JB
>
> On Aug 16, 2017, 08:54, at 08:54, "Ismaël Mejía" 
> wrote:
> >Hello and welcome Griselda, Umang, Justin
> >
> >Apart of the links provided by Ahmet you might read Beam-related
> >material on the website (See Documentation > Programming Guide and
> >Documentation > Additional Resources among others).
> >
> >But probably as important as improving your Beam related knowledge is
> >to understand the principles of an open source project and more
> >concretely the way the Apache projects work (in case this is your
> >first Apache project), concepts like How projects are structured
> >(PMCs, committers, votes, etc) and the most important ones Community
> >over Code and Meritocracy.
> >
> >https://www.apache.org/foundation/how-it-works.html
> >https://blogs.apache.org/foundation/entry/asf_15_community_over_code
> >
> >Welcome all and don't hesitate to ask questions, we are all here to
> >make this project better so for sure we can help.
> >Ismaël
> >
> >
> >On Tue, Aug 15, 2017 at 11:04 PM, Justin T  wrote:
> >> Hello Beam community,
> >>
> >> I am also a new member, and I feel a little better knowing that there
> >> others on the same boat:)
> >>
> >> My name is Justin and I work as a full stack engineer for Neustar, a
> >> marketing analytics company in San Diego. Over the past few weeks I
> >have
> >> been getting more familiar with Beam via documentation, papers,
> >videos, and
> >> the old email archives and I am very excited to start making
> >contributions.
> >> Thank you Altay for the useful links!
> >>
> >> -Justin Tumale
> >>
> >> On Tue, Aug 15, 2017 at 11:19 AM, Ahmet Altay
> >
> >> wrote:
> >>
> >>> Welcome both of you!
> >>>
> >>> Some helpful starting points:
> >>> - Contribution guide [1]
> >>> - Unassigned starter issues in JIRA [2]
> >>>
> >>> Ahmet
> >>>
> >>> [1] https://beam.apache.org/contribute/contribution-guide/
> >>> [2]
> >>> https://issues.apache.org/jira/browse/BEAM-2632?jql=
> >>>
> >project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20Reopened)%20AND%
> >>>
> >20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20starter%20AND%
> >>> 20assignee%20in%20(EMPTY)%20ORDER%20BY%20created%20DESC%
> >>> 2C%20priority%20DESC
> >>>
> >>> On Tue, Aug 15, 2017 at 11:13 AM, Umang Sharma 
> >>> wrote:
> >>>
> >>> > Hi Gris,
> >>> > Nice to meet you.
> >>> >
> >>> > I'd like to take this opportunity to introduce me to you and
> >everyone
> >>> else
> >>> > in  the dev team.
> >>> >
> >>> > I’m m Umang Sharma. I'm an associate in Data Science and
> >Applications at
> >>> > Accenture Digital.
> >>> >
> >>> >
> >>> > I write in python, Java and a number of other languages.
> >>> > I'd love to contribute to Beam. It'd br great if someone guides me
> >to get
> >>> > started with contributing :)
> >>> >
> >>> > Among the other things i like are polo golf, giving talks and
> >talking
> >>> about
> >>> > mu work .
> >>> >
> >>> > Thanks,
> >>> > Umang
> >>> >
> >>> >
> >>> > On Aug 15, 2017 22:40, "Griselda Cuevas" 
> >>> wrote:
> >>> >
> >>> > Hi Beam community,
> >>> >
> >>> > I’m Griselda (Gris) Cuevas and I’m very excited to join the
> >community,
> >>> I’m
> >>> > looking forward to learning awesome things from you and to getting
> >the
> >>> > chance to collaborate on great initiatives.
> >>> >
> >>> > I’m currently working at Google and I’m studying a masters in
> >operations
> >>> > research and data science at UC Berkeley. I’m interested in
> >Natural
> >>> > Language Processing, Information Retrieval and Online Communities.
> >Some
> >>> > other fun topics I love are juggling, camping and -just getting
> >into it-
> >>> >  listening to podcasts, so if you ever want to discuss and talk
> >about any
> >>> > of these topics, here I am!
> >>> >
> >>> > Another reason why I’m here is because I want to help this project
> >grow
> >>> and
> >>> > thrive. This means that you’ll see me contributing to the project,
> >>> reaching
> >>> > out to ask questions as I get familiar with our community, and I
> >also
> >>> > helping evangelize Apache Beam by organizing meetups, hangouts,
> >etc.
> >>> >
> >>> > I say bye for now, I’ll see you around,
> >>> >
> >>> > Cheers,
> >>> >
> >>> > G
> >>> >
> >>>
>


New blog post: Splittable DoFn

2017-08-16 Thread Eugene Kirpichov
Hi all,

The blog post Powerful and modular IO connectors with Splittable DoFn in
Apache Beam 
just
went live - take a look!

*One of the most important parts of the Apache Beam ecosystem is its
quickly growing set of connectors that allow Beam pipelines to read and
write data to various data storage systems (“IOs”). Currently, Beam ships
over 20 IO connectors with many more in active development. As user demands
for IO connectors grew, our work on improving the related Beam APIs (in
particular, the Source API) produced an unexpected result: a generalization
of Beam’s most basic primitive, DoFn.*

Thanks to all the reviewers of the PR
 for edit suggestions!