Re: [DISCUSS] Graduation to a top-level project

2016-12-20 Thread Davor Bonaci
A quick update: a meeting of the ASF Board of Directors is scheduled for
later this week, at which the Board may consider taking action on our
graduation proposal!

That said, even if the Board does enact it, any public announcement is
expected to be delayed to the first half of January due to the holidays.

In the meanwhile, we are still an Incubator podling and should continue to
operate as such until the ASF announces otherwise... so, please hold your
speculation and enthusiasm for another few weeks ;-)

Davor

On Thu, Dec 8, 2016 at 11:09 PM, Jean-Baptiste Onofré 
wrote:

> Congrats all !
>
> Regards
> JB
>
>
> On 12/09/2016 12:42 AM, Davor Bonaci wrote:
>
>> A quick update: the Apache Incubator has adopted the proposed graduation
>> resolution [1], and it is now presented to the ASF Board of Directors for
>> their consideration.
>>
>> Davor
>>
>> [1]
>> https://lists.apache.org/thread.html/71a1c63837a7d1506a10af9
>> c70af1c24db988451ac5b53fa2467b9b8@%3Cgeneral.incubator.apache.org%3E
>>
>> On Mon, Dec 5, 2016 at 10:35 AM, Neelesh Salian 
>> wrote:
>>
>> Quite an interesting discussion. Looking forward to the graduation. :)
>>> Thanks for putting this together.
>>>
>>> On Mon, Dec 5, 2016 at 10:30 AM, Davor Bonaci  wrote:
>>>
>>> A quick update: the vote within the Incubator has been started [1].
>>>>
>>>> Davor
>>>>
>>>> [1]
>>>> https://lists.apache.org/thread.html/a8e9cecfe93f0e464cc7c1774d2761
>>>> ca14326df1101b7670ca8b1dc3@%3Cgeneral.incubator.apache.org%3E
>>>>
>>>> On Fri, Dec 2, 2016 at 11:40 AM, Davor Bonaci  wrote:
>>>>
>>>> A quick update on the progress: the PPMC is nearly complete drafting
>>>>>
>>>> the
>>>
>>>> proposed resolution, and I've just kicked off the discussion within the
>>>>> Incubator community [1].
>>>>>
>>>>> I'd encourage everyone to participate in the discussion and carry your
>>>>> enthusiasm there. Thanks!
>>>>>
>>>>> Davor
>>>>>
>>>>> [1] https://lists.apache.org/thread.html/b9c1071b355588468368145
>>>>>
>>>> 75ada3c
>>>
>>>> dca61c72dc1e672ab994a9c936@%3Cgeneral.incubator.apache.org%3E
>>>>>
>>>>> On Thu, Nov 24, 2016 at 1:52 AM, Maximilian Michels 
>>>>> wrote:
>>>>>
>>>>> +1
>>>>>>
>>>>>> I see a healthy project which deserves to graduate.
>>>>>>
>>>>>> On Wed, Nov 23, 2016 at 6:03 PM, Davor Bonaci 
>>>>>>
>>>>> wrote:
>>>
>>>> Thanks everyone for the enthusiastic support!
>>>>>>>
>>>>>>> Please keep the thread going, as we kick off the process on private@
>>>>>>>
>>>>>> .
>>>
>>>> Please don’t forget to bring up any data points that might help
>>>>>>>
>>>>>> strengthen
>>>>>>
>>>>>>> our case.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> On Wed, Nov 23, 2016 at 8:45 AM, Scott Wegner
>>>>>>>
>>>>>> 
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>> +1 (beaming)
>>>>>>>>
>>>>>>>> On Wed, Nov 23, 2016 at 8:25 AM Robert Bradshaw
>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> +1
>>>>>>>>
>>>>>>>> On Wed, Nov 23, 2016 at 7:36 AM, Lukasz Cwik
>>>>>>>>
>>>>>>> >>>
>>>>>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> +1
>>>>>>>>>
>>>>>>>>> On Wed, Nov 23, 2016 at 9:48 AM, Stephan Ewen 
>>>>>>>>>
>>>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>>> +1
>>>>>>>>>> The community if doing very well and behaving very Apache
>>>>>>>>>>
>>>>>>>>>> On Wed, Nov 23, 2016 at 9:54 AM, Etienne Chauchot <
>>>>>>>>>>
>

Re: [VOTE] Release 0.4.0-incubating, release candidate #3

2016-12-18 Thread Davor Bonaci
Indeed -- I did help JB with the release ever so slightly, due to the
networking connectivity issue reaching repository.apache.org, which JB
further described and is tracked in INFRA-13086 [1]. This is not
Beam-specific.

The current signature shouldn't be a problem at all, but, since others are
asking about it, I think it would be the best to simply re-sign the source
.zip archive and continuing this vote. JB, what do you think?

Regarding the release itself, I think we need to keep raising the quality
and maturity release-over-release, and test signals are an excellent way to
demonstrate that. Due to the recent upgrades to Jenkins, usage of the DSL,
etc. (thanks INFRA and Jason Kuster), we can now, for the first time,
formally show that the release candidate clearly passes all Jenkins suites
that we have:
* All unit tests across the project, plus example ITs across all runners
[2], [3].
* All integration tests on the Apex runner [4].
* All integration tests on the Flink runner [5].
* All integration tests on the Spark runner [6].
* All integration tests on the Dataflow runner [7].

That said, I know of a few issues/regressions in the areas that are not
well tested today. I think Dan Halperin has more context, so I'll let him
speak of the details, and quote relevant JIRA issues.

With the known issues in 0.3.0-incubating, such as trouble running examples
out-of-the-box, I think this release candidate is a clear win. Of course,
that may change if more issues are discovered.

For me, this release candidate is +1 (at this time), contingent upon no
known major issues affecting Apex, Flink and Spark runners.

Davor

[1] https://issues.apache.org/jira/browse/INFRA-13086
[2]
https://builds.apache.org/view/Beam/job/beam_PreCommit_Java_MavenInstall/5994/
[3]
https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/2116/
[4]
https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_RunnableOnService_Apex/10/
[5]
https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_RunnableOnService_Flink/1120/
[6]
https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_RunnableOnService_Spark/430/
[7]
https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_RunnableOnService_Dataflow/1830/


On Sat, Dec 17, 2016 at 4:13 PM, Kenneth Knowles 
wrote:

> +1, as long as it is fine for the release to be signed by a PMC member
> other than the release manager. Otherwise need to replace the .asc file.
>
> Following [Apache release checklist](
> http://incubator.apache.org/guides/releasemanagement.html#check-list):
>
> 1.1 Verified checksums & signature (Davor's)
> 2.1 Ran unit tests and integration tests
> 3.1 DISCLAIMER is correct
> 3.2 LICENSE & NOTICE are correct
> 3.3 Files have license headers (RAT & checkstyle)
> 3.4 Provenance is clear
> 3.5 Dependencies license are legal (RAT) [2]
> 3.6 Release contains source code, no binaries
>
> Additionally:
>
>  - Went over the generated javadoc (filed tickets but no release blockers)
>  - Went over the generated release notes
>  - Sanity checked the Maven Central artifacts
>  - Confirmed that the git tag matches
>  - Checked the website PR
>
> I heartily agree that the components would give much better context on
> tickets. Even with that, our JIRA titles could use a lot of improvement.
>
>
> On Fri, Dec 16, 2016 at 5:06 AM, Jean-Baptiste Onofré 
> wrote:
>
> > Hi everyone,
> >
> > Please review and vote on the release candidate #3 for the version
> > 0.4.0-incubating, as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > The complete staging area is available for your review, which includes:
> > * JIRA release notes [1],
> > * the official Apache source release to be deployed to dist.apache.org
> > [2],
> > * all artifacts to be deployed to the Maven Central Repository [3],
> > * source code tag "v0.4.0-incubating-RC3" [4],
> > * website pull request listing the release and publishing the API
> reference
> > manual [5].
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PPMC affirmative votes.
> >
> > Thanks,
> > Regards
> > JB
> >
> > [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?proje
> > ctId=12319527&version=12338590
> > [2] https://dist.apache.org/repos/dist/dev/incubator/beam/0.4.0-
> > incubating/
> > [3] https://repository.apache.org/content/repositories/orgapache
> beam-1008/
> > [4] https://git-wip-us.apache.org/repos/asf?p=incubator-beam.git
> > ;a=tag;h=112e38e4a68b07e6bf4916d1bdcc7ecaca8bbbd4
> > [5] https://github.com/apache/incubator-beam-site/pull/109
> >
>


Re: [VOTE] Release 0.4.0-incubating, release candidate #1

2016-12-15 Thread Davor Bonaci
I think we should build another RC.

Two issues:
* Metrics issue that JB pointed out earlier. It seems to cause a somewhat
poor user experience for every pipeline executed on the Direct runner.
(Thanks JB for finding this out!)
* Failure of testSideInputsWithMultipleWindows in Jenkins [1].

Both issues seem easy, trivial, non-risky fixes that are already committed
to master. I'd suggest just taking them.

Davor

[1]
https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_RunnableOnService_Dataflow/1819/

On Thu, Dec 15, 2016 at 8:45 AM, Ismaël Mejía  wrote:

> +1 (non-binding)
>
> - verified signatures + checksums
> - run mvn clean verify -Prelease, all artifacts+tests run smoothly
>
> The release artifacts are signed with the key with fingerprint 8F0D334F
> https://dist.apache.org/repos/dist/release/incubator/beam/KEYS
>
> I just created a JIRA to add the signer/KEYS information in the release
> template, I will do a PR for this later on.
>
> Ismaël
>
> On Thu, Dec 15, 2016 at 2:26 PM, Jean-Baptiste Onofré 
> wrote:
>
> > Hi Amit,
> >
> > thanks for the update.
> >
> > As you changed the Jira, the Release Notes are now up to date.
> >
> > Regards
> > JB
> >
> >
> > On 12/15/2016 02:20 PM, Amit Sela wrote:
> >
> >> I see three problems in the release notes (related to Spark runner):
> >>
> >> Improvement:
> >> 
> >> [BEAM-757] - The SparkRunner should utilize the SDK's DoFnRunner instead
> >> of
> >> writing it's own.
> >> 
> >> [BEAM-807] - [SparkRunner] Replace OldDoFn with DoFn
> >> 
> >> [BEAM-855] - Remove the need for --streaming option in the spark runner
> >>
> >> BEAM-855 is duplicate and probably shouldn't have had a Fix Version.
> >>
> >> The other two are not a part of this release - I was probably too eager
> to
> >> mark them fixed after merge and I accidentally put 0.4.0 as the Fix
> >> Version.
> >>
> >> I made the changes in JIRA now.
> >>
> >> Thanks,
> >> Amit
> >>
> >> On Thu, Dec 15, 2016 at 3:09 PM Jean-Baptiste Onofré 
> >> wrote:
> >>
> >> Reviewing and testing the release, I see:
> >>>
> >>> 16/12/15 14:04:47 ERROR MetricsContainer: Unable to update metrics on
> >>> the current thread. Most likely caused by using metrics outside the
> >>> managed work-execution thread.
> >>>
> >>> It doesn't block the execution of the pipeline, but basically, it means
> >>> that metrics don't work anymore.
> >>>
> >>> I'm investigating.
> >>>
> >>> Regards
> >>> JB
> >>>
> >>> On 12/15/2016 01:46 PM, Jean-Baptiste Onofré wrote:
> >>>
>  Hi everyone,
> 
>  Please review and vote on the release candidate #1 for the version
>  0.4.0-incubating, as follows:
>  [ ] +1, Approve the release
>  [ ] -1, Do not approve the release (please provide specific comments)
> 
>  The complete staging area is available for your review, which
> includes:
>  * JIRA release notes [1],
>  * the official Apache source release to be deployed to
> dist.apache.org
> 
> >>> [2],
> >>>
>  * all artifacts to be deployed to the Maven Central Repository [3],
>  * source code tag "v0.4.0-incubating-RC1" [4],
>  * website pull request listing the release and publishing the API
> 
> >>> reference
> >>>
>  manual [5].
> 
>  The vote will be open for at least 72 hours. It is adopted by majority
>  approval, with at least 3 PPMC affirmative votes.
> 
>  Thanks,
>  Regards
>  JB
> 
>  [1]
> 
>  https://issues.apache.org/jira/secure/ReleaseNote.jspa?proje
> >>> ctId=12319527&version=12338590
> >>>
> 
>  [2]
> 
> >>> https://dist.apache.org/repos/dist/dev/incubator/beam/0.4.0-
> incubating/
> >>>
>  [3]
> 
> >>> https://repository.apache.org/content/repositories/orgapachebeam-1006/
> >>>
>  [4]
> 
>  https://git-wip-us.apache.org/repos/asf?p=incubator-beam.git
> >>> ;a=tag;h=85d1c8a2f85bbc667c90f55ff0eb27de5c2446a6
> >>>
> 
>  [5] https://github.com/apache/incubator-beam-site/pull/109
> 
> >>>
> >>> --
> >>> Jean-Baptiste Onofré
> >>> jbono...@apache.org
> >>> http://blog.nanthrax.net
> >>> Talend - http://www.talend.com
> >>>
> >>>
> >>
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>


Re: Review on Jira for 0.4.0-incubating

2016-12-13 Thread Davor Bonaci
>
> I wanted to suggest if we can have sort of a window or timeline for
> feature/bug code freeze prior to release to ensure stability?
>

Release branches are a typical solution; I think we just need to get better
in using them appropriately.


Re: Review on Jira for 0.4.0-incubating

2016-12-13 Thread Davor Bonaci
I'd suggest to proceed with 0.4.0-incubating (as JB previously planned).

My reasoning: I don't think we'll be able to release a non-incubating
release next week, regardless of the Board's graduation decision. I think
it will take a while (more details to follow on a separate thread). On the
other hand, 0.3.0-incubating has some important issues (e.g., template
projects don't work across runners, WordCount has issues on Windows OS). I
think it makes sense to fix these issues for our users, and have a better
product if/when the graduation announcement comes.

On Tue, Dec 13, 2016 at 9:05 AM, Jean-Baptiste Onofré 
wrote:

> Hi,
>
> Either way is fine for me too.
>
> We discussed about the release schedule independently from the graduation
> process, that's why 0.4.0-incubator was planned around today.
>
> Regards
> JB
>
>
> On 12/13/2016 06:02 PM, Daniel Kulp wrote:
>
>> Hate to suggest this….
>>
>> Assuming the Board OK’s the graduation next Wednesday, if we wait till
>> then to do the build, we can drop the the incubator stuff entirely and it
>> could be a “first release” outside of incubation.   We could avoid the
>> extra vote on the incubator list, etc….
>>
>> Would it make sense to delay the week?   Not a big deal either way, but I
>> don’t think I’ve ever seen a project do a release between the graduation
>> vote and the board vote.   Every project I’ve seen decided to wait to have
>> the “we’ve graduated!” release.
>>
>> Dan
>>
>>
>>
>> On Dec 13, 2016, at 9:43 AM, Dan Halperin 
>>> wrote:
>>>
>>> Update: we think we've knocked off all the 0.4.0-incubating blockers,
>>> including postponing some. JB is going to start the release process soon!
>>>
>>> On Sat, Dec 3, 2016 at 10:42 PM, Jean-Baptiste Onofré 
>>> wrote:
>>>
>>> Very good point Frances.

 Definitely something we have to do.

 Regards
 JB


 On 12/04/2016 07:38 AM, Frances Perry wrote:

 Sounds great, JB!
>
> The major blocker in my opinion is to finish the polishing pass on the
> quickstarts and example archetypes, so that users will have a great
> experience trying out 0.4.0-incubating. I know we've made some
> significant
> progress there in the last few weeks, but I don't think we've quite
> finished. For example, https://issues.apache.org/jira/browse/BEAM-909
> is
> unresolved and marked as 0.4.0-incubating.
>
> On Sat, Dec 3, 2016 at 10:26 PM, Jean-Baptiste Onofré  >
> wrote:
>
> Hi beamers,
>
>>
>> We plan a 0.4.0-incubating release pretty soon. I propose to manage
>> this
>> release.
>>
>> I started to review the Jira with fix version set to 0.4.0-incubating.
>>
>> Please, update the fix version in Jira if you are working on specific
>> Jira
>> and you want to include in the 0.4.0-incubating release.
>>
>> Thanks
>> Regards
>> JB
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>>
>>
> --
 Jean-Baptiste Onofré
 jbono...@apache.org
 http://blog.nanthrax.net
 Talend - http://www.talend.com


>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #1787

2016-12-11 Thread Davor Bonaci
>
> Is there any way to retry staging if it fails?
>>
>
I believe the code already does this, but Pei would know for sure.


Re: [DISCUSS] [BEAM-438] Rename one of PTransform.apply or PInput.apply

2016-12-09 Thread Davor Bonaci
The sooner, the better. I think we should move forward with this.

On Thu, Dec 8, 2016 at 10:56 PM, Sergio Fernández  wrote:

> +1
> Definitively a confusing API when you first jump on building a PTransform.
> For such API change, the sooner the better, I think.
>
> On Wed, Dec 7, 2016 at 10:37 PM, Kenneth Knowles 
> wrote:
>
> > Hi all,
> >
> > I want to bring up another major backwards-incompatible change before it
> is
> > too late, to resolve [BEAM-438].
> >
> > Summary: Leave PInput.apply the same but rename PTransform.apply to
> > PTransform.expand. I have opened [PR #1538] just for reference (it took
> 30
> > seconds using IDE automated refactor)
> >
> > This change affects *PTransform authors* but does *not* affect pipeline
> > authors.
> >
> > This issue was filed a long time ago. It has been a problem many times
> with
> > actual users since before Beam started incubating. This is what goes
> wrong
> > (often):
> >
> >PCollection input = ...
> >PTransform, ...> transform = ...
> >
> >transform.apply(input)
> >
> > This type checks and even looks perfectly normal. Do you see the error?
> >
> > ... what we need the user to write is:
> >
> > input.apply(transform)
> >
> > What a confusing difference! After all, the first one type-checks and the
> > first one is how you apply a Function or Predicate or
> SerializableFunction,
> > etc. But it is broken. With transform.apply(input) the transform is not
> > registered with the pipeline at all.
> >
> > We obviously can't (and don't want to) change the most core way that
> > pipeline authors use Beam, so PInput.apply (aka PCollection.apply) must
> > remain the same. But we do need a way to make it impossible to mix these
> > up.
> >
> > The simplest way I can think of is to choose a new name for the other
> > method involved. Users probably won't write transform.expand(input) since
> > they will never have seen it in any examples, etc. This will just make
> > PTransform authors need to do a global rename, and the type system will
> > direct them to all cases so there is no silent failure possible.
> >
> > What do you think?
> >
> > Kenn
> >
> > [BEAM-438] https://issues.apache.org/jira/browse/BEAM-438
> > [PR #1538] https://github.com/apache/incubator-beam/pull/1538
> >
> > p.s. there is a really amusing and confusing call chain:
> PCollection.apply
> > -> Pipeline.applyTransform -> Pipeline.applyInternal ->
> > PipelineRunner.apply -> PTransform.apply
> >
> > After this change and work to get the runner out of the loop, it becomes
> > PCollection.apply -> Pipeline.applyTransform -> PTransform.expand
> >
>
>
>
> --
> Sergio Fernández
> Partner Technology Manager
> Redlink GmbH
> m: +43 6602747925
> e: sergio.fernan...@redlink.co
> w: http://redlink.co
>


Re: [DISCUSS] Graduation to a top-level project

2016-12-08 Thread Davor Bonaci
A quick update: the Apache Incubator has adopted the proposed graduation
resolution [1], and it is now presented to the ASF Board of Directors for
their consideration.

Davor

[1]
https://lists.apache.org/thread.html/71a1c63837a7d1506a10af9c70af1c24db988451ac5b53fa2467b9b8@%3Cgeneral.incubator.apache.org%3E

On Mon, Dec 5, 2016 at 10:35 AM, Neelesh Salian 
wrote:

> Quite an interesting discussion. Looking forward to the graduation. :)
> Thanks for putting this together.
>
> On Mon, Dec 5, 2016 at 10:30 AM, Davor Bonaci  wrote:
>
> > A quick update: the vote within the Incubator has been started [1].
> >
> > Davor
> >
> > [1]
> > https://lists.apache.org/thread.html/a8e9cecfe93f0e464cc7c1774d2761
> > ca14326df1101b7670ca8b1dc3@%3Cgeneral.incubator.apache.org%3E
> >
> > On Fri, Dec 2, 2016 at 11:40 AM, Davor Bonaci  wrote:
> >
> > > A quick update on the progress: the PPMC is nearly complete drafting
> the
> > > proposed resolution, and I've just kicked off the discussion within the
> > > Incubator community [1].
> > >
> > > I'd encourage everyone to participate in the discussion and carry your
> > > enthusiasm there. Thanks!
> > >
> > > Davor
> > >
> > > [1] https://lists.apache.org/thread.html/b9c1071b355588468368145
> 75ada3c
> > > dca61c72dc1e672ab994a9c936@%3Cgeneral.incubator.apache.org%3E
> > >
> > > On Thu, Nov 24, 2016 at 1:52 AM, Maximilian Michels 
> > > wrote:
> > >
> > >> +1
> > >>
> > >> I see a healthy project which deserves to graduate.
> > >>
> > >> On Wed, Nov 23, 2016 at 6:03 PM, Davor Bonaci 
> wrote:
> > >> > Thanks everyone for the enthusiastic support!
> > >> >
> > >> > Please keep the thread going, as we kick off the process on private@
> .
> > >> > Please don’t forget to bring up any data points that might help
> > >> strengthen
> > >> > our case.
> > >> >
> > >> > Thanks!
> > >> >
> > >> > On Wed, Nov 23, 2016 at 8:45 AM, Scott Wegner
> > >> 
> > >> > wrote:
> > >> >
> > >> >> +1 (beaming)
> > >> >>
> > >> >> On Wed, Nov 23, 2016 at 8:25 AM Robert Bradshaw
> > >> >> 
> > >> >> wrote:
> > >> >>
> > >> >> +1
> > >> >>
> > >> >> On Wed, Nov 23, 2016 at 7:36 AM, Lukasz Cwik
> >  > >> >
> > >> >> wrote:
> > >> >> > +1
> > >> >> >
> > >> >> > On Wed, Nov 23, 2016 at 9:48 AM, Stephan Ewen 
> > >> wrote:
> > >> >> >
> > >> >> >> +1
> > >> >> >> The community if doing very well and behaving very Apache
> > >> >> >>
> > >> >> >> On Wed, Nov 23, 2016 at 9:54 AM, Etienne Chauchot <
> > >> echauc...@gmail.com>
> > >> >> >> wrote:
> > >> >> >>
> > >> >> >> > A big +1 of course, very excited to go forward
> > >> >> >> >
> > >> >> >> > Etienne
> > >> >> >> >
> > >> >> >> >
> > >> >> >> >
> > >> >> >> > Le 22/11/2016 à 19:19, Davor Bonaci a écrit :
> > >> >> >> >
> > >> >> >> >> Hi everyone,
> > >> >> >> >> With all the progress we’ve had recently in Apache Beam, I
> > think
> > >> it
> > >> >> is
> > >> >> >> >> time
> > >> >> >> >> we start the discussion about graduation as a new top-level
> > >> project
> > >> >> at
> > >> >> >> the
> > >> >> >> >> Apache Software Foundation.
> > >> >> >> >>
> > >> >> >> >> Graduation means we are a self-sustaining and self-governing
> > >> >> community,
> > >> >> >> >> and
> > >> >> >> >> ready to be a full participant in the Apache Software
> > >> Foundation. It
> > >> >> >> does
> > >> >> >> >> not imply that our community growth is complete or that a
> > >> pa

Re: [DISCUSS] Graduation to a top-level project

2016-12-05 Thread Davor Bonaci
A quick update: the vote within the Incubator has been started [1].

Davor

[1]
https://lists.apache.org/thread.html/a8e9cecfe93f0e464cc7c1774d2761ca14326df1101b7670ca8b1dc3@%3Cgeneral.incubator.apache.org%3E

On Fri, Dec 2, 2016 at 11:40 AM, Davor Bonaci  wrote:

> A quick update on the progress: the PPMC is nearly complete drafting the
> proposed resolution, and I've just kicked off the discussion within the
> Incubator community [1].
>
> I'd encourage everyone to participate in the discussion and carry your
> enthusiasm there. Thanks!
>
> Davor
>
> [1] https://lists.apache.org/thread.html/b9c1071b35558846836814575ada3c
> dca61c72dc1e672ab994a9c936@%3Cgeneral.incubator.apache.org%3E
>
> On Thu, Nov 24, 2016 at 1:52 AM, Maximilian Michels 
> wrote:
>
>> +1
>>
>> I see a healthy project which deserves to graduate.
>>
>> On Wed, Nov 23, 2016 at 6:03 PM, Davor Bonaci  wrote:
>> > Thanks everyone for the enthusiastic support!
>> >
>> > Please keep the thread going, as we kick off the process on private@.
>> > Please don’t forget to bring up any data points that might help
>> strengthen
>> > our case.
>> >
>> > Thanks!
>> >
>> > On Wed, Nov 23, 2016 at 8:45 AM, Scott Wegner
>> 
>> > wrote:
>> >
>> >> +1 (beaming)
>> >>
>> >> On Wed, Nov 23, 2016 at 8:25 AM Robert Bradshaw
>> >> 
>> >> wrote:
>> >>
>> >> +1
>> >>
>> >> On Wed, Nov 23, 2016 at 7:36 AM, Lukasz Cwik > >
>> >> wrote:
>> >> > +1
>> >> >
>> >> > On Wed, Nov 23, 2016 at 9:48 AM, Stephan Ewen 
>> wrote:
>> >> >
>> >> >> +1
>> >> >> The community if doing very well and behaving very Apache
>> >> >>
>> >> >> On Wed, Nov 23, 2016 at 9:54 AM, Etienne Chauchot <
>> echauc...@gmail.com>
>> >> >> wrote:
>> >> >>
>> >> >> > A big +1 of course, very excited to go forward
>> >> >> >
>> >> >> > Etienne
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > Le 22/11/2016 à 19:19, Davor Bonaci a écrit :
>> >> >> >
>> >> >> >> Hi everyone,
>> >> >> >> With all the progress we’ve had recently in Apache Beam, I think
>> it
>> >> is
>> >> >> >> time
>> >> >> >> we start the discussion about graduation as a new top-level
>> project
>> >> at
>> >> >> the
>> >> >> >> Apache Software Foundation.
>> >> >> >>
>> >> >> >> Graduation means we are a self-sustaining and self-governing
>> >> community,
>> >> >> >> and
>> >> >> >> ready to be a full participant in the Apache Software
>> Foundation. It
>> >> >> does
>> >> >> >> not imply that our community growth is complete or that a
>> particular
>> >> >> level
>> >> >> >> of technical maturity has been reached, rather that we are on a
>> solid
>> >> >> >> trajectory in those areas. After graduation, we will still
>> >> periodically
>> >> >> >> report to, and be overseen by, the ASF Board to ensure continued
>> >> growth
>> >> >> of
>> >> >> >> a healthy community.
>> >> >> >>
>> >> >> >> Graduation is an important milestone for the project. It is also
>> key
>> >> to
>> >> >> >> further grow the user community: many users (incorrectly) see
>> >> incubation
>> >> >> >> as
>> >> >> >> a sign of instability and are much less likely to consider us
>> for a
>> >> >> >> production use.
>> >> >> >>
>> >> >> >> A way to think about graduation readiness is through the Apache
>> >> Maturity
>> >> >> >> Model [1]. I think we clearly satisfy all the requirements [2].
>> It is
>> >> >> >> probably worth emphasizing the recent community growth: over
>> each of
>> >> the
>> >> >> >> past three months, no single organization contributing to Beam
>> has
>> >&g

Re: Introduction + new contributions

2016-12-03 Thread Davor Bonaci
Welcome!

BEAM-961 is now assigned to you.

On Sat, Dec 3, 2016 at 6:08 AM, Vladisav Jelisavcic 
wrote:

> Hi,
>
> my name is Vladisav, and I would like to get involved in Apache Beam.
> For starters, I'll do something simple, e.g.: BEAM-961
>
> Best regards,
> Vladisav
>


Re: [DISCUSS] Graduation to a top-level project

2016-12-02 Thread Davor Bonaci
A quick update on the progress: the PPMC is nearly complete drafting the
proposed resolution, and I've just kicked off the discussion within the
Incubator community [1].

I'd encourage everyone to participate in the discussion and carry your
enthusiasm there. Thanks!

Davor

[1]
https://lists.apache.org/thread.html/b9c1071b35558846836814575ada3cdca61c72dc1e672ab994a9c936@%3Cgeneral.incubator.apache.org%3E

On Thu, Nov 24, 2016 at 1:52 AM, Maximilian Michels  wrote:

> +1
>
> I see a healthy project which deserves to graduate.
>
> On Wed, Nov 23, 2016 at 6:03 PM, Davor Bonaci  wrote:
> > Thanks everyone for the enthusiastic support!
> >
> > Please keep the thread going, as we kick off the process on private@.
> > Please don’t forget to bring up any data points that might help
> strengthen
> > our case.
> >
> > Thanks!
> >
> > On Wed, Nov 23, 2016 at 8:45 AM, Scott Wegner  >
> > wrote:
> >
> >> +1 (beaming)
> >>
> >> On Wed, Nov 23, 2016 at 8:25 AM Robert Bradshaw
> >> 
> >> wrote:
> >>
> >> +1
> >>
> >> On Wed, Nov 23, 2016 at 7:36 AM, Lukasz Cwik 
> >> wrote:
> >> > +1
> >> >
> >> > On Wed, Nov 23, 2016 at 9:48 AM, Stephan Ewen 
> wrote:
> >> >
> >> >> +1
> >> >> The community if doing very well and behaving very Apache
> >> >>
> >> >> On Wed, Nov 23, 2016 at 9:54 AM, Etienne Chauchot <
> echauc...@gmail.com>
> >> >> wrote:
> >> >>
> >> >> > A big +1 of course, very excited to go forward
> >> >> >
> >> >> > Etienne
> >> >> >
> >> >> >
> >> >> >
> >> >> > Le 22/11/2016 à 19:19, Davor Bonaci a écrit :
> >> >> >
> >> >> >> Hi everyone,
> >> >> >> With all the progress we’ve had recently in Apache Beam, I think
> it
> >> is
> >> >> >> time
> >> >> >> we start the discussion about graduation as a new top-level
> project
> >> at
> >> >> the
> >> >> >> Apache Software Foundation.
> >> >> >>
> >> >> >> Graduation means we are a self-sustaining and self-governing
> >> community,
> >> >> >> and
> >> >> >> ready to be a full participant in the Apache Software Foundation.
> It
> >> >> does
> >> >> >> not imply that our community growth is complete or that a
> particular
> >> >> level
> >> >> >> of technical maturity has been reached, rather that we are on a
> solid
> >> >> >> trajectory in those areas. After graduation, we will still
> >> periodically
> >> >> >> report to, and be overseen by, the ASF Board to ensure continued
> >> growth
> >> >> of
> >> >> >> a healthy community.
> >> >> >>
> >> >> >> Graduation is an important milestone for the project. It is also
> key
> >> to
> >> >> >> further grow the user community: many users (incorrectly) see
> >> incubation
> >> >> >> as
> >> >> >> a sign of instability and are much less likely to consider us for
> a
> >> >> >> production use.
> >> >> >>
> >> >> >> A way to think about graduation readiness is through the Apache
> >> Maturity
> >> >> >> Model [1]. I think we clearly satisfy all the requirements [2].
> It is
> >> >> >> probably worth emphasizing the recent community growth: over each
> of
> >> the
> >> >> >> past three months, no single organization contributing to Beam has
> >> had
> >> >> >> more
> >> >> >> than ~50% of the unique contributors per month [2, see
> assumptions].
> >> >> >> That’s
> >> >> >> a great statistic that shows how much we’ve grown our diversity!
> >> >> >>
> >> >> >> Process-wise, graduation consists of drafting a board resolution,
> >> which
> >> >> >> needs to identify the full Project Management Committee, and
> getting
> >> it
> >> >> >> approved by the community, the Incubator, and the Board. Within
> the
> >> Beam
> >> >> >> community, most of thes

Re: DataCamp II Salzburg

2016-12-01 Thread Davor Bonaci
This is great! (Please share any recording after the event if available).

On Thu, Dec 1, 2016 at 6:04 AM, Sergio Fernández  wrote:

> Hi folks,
>
> the next week we have in Salzburg a DataCamp, a meetup about Big Data:
>
> https://www.meetup.com/Salzburg-Big-Data-Meetup/events/231844168/
>
> Where I'm going to make a session introducing Apache Beam.
>
> I'll share here the material afterwards, very introductory anyway.
>
> Cheers,
>
> --
> Sergio Fernández
> Partner Technology Manager
> Redlink GmbH
> m: +43 6602747925
> e: sergio.fernan...@redlink.co
> w: http://redlink.co
>


Re: Build failed in Jenkins: beam_SeedJob_Main #19

2016-12-01 Thread Davor Bonaci
Another (temporary) side effect of the pending PR. Please ignore. Jason --
FYI.

On Wed, Nov 30, 2016 at 10:00 PM, Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> See 
>
> Changes:
>
> [klk] [BEAM-747] Fix FileChecksumMatcher That Inconsistent With FS
>
> [klk] Improvements to ReduceFnRunner prefetching
>
> [klk] Move PerKeyCombineFnRunners to runners/core.
>
> [klk] Revert "Improvements to ReduceFnRunner prefetching"
>
> [dhalperi] Shutdown DynamicSplit Executor in Cleanup
>
> [tgroh] Preserves compressed windows in PushbackSideInputDoFnRunner
>
> [dhalperi] Update examples archetype with runner profiles
>
> --
> Started by timer
> [EnvInject] - Loading node environment variables.
> Building remotely on beam1 (beam) in workspace  job/beam_SeedJob_Main/ws/>
>  > git rev-parse --is-inside-work-tree # timeout=10
> Fetching changes from the remote Git repository
>  > git config remote.origin.url https://github.com/apache/
> incubator-beam.git # timeout=10
> Fetching upstream changes from https://github.com/apache/
> incubator-beam.git
>  > git --version # timeout=10
>  > git -c core.askpass=true fetch --tags --progress
> https://github.com/apache/incubator-beam.git 
> +refs/heads/*:refs/remotes/origin/*
> +refs/pull/*:refs/remotes/origin/pr/*
>  > git rev-parse origin/master^{commit} # timeout=10
> Checking out Revision 711c68092fd771c3f9be4a5d0dd0ecf077f1aeab
> (origin/master)
>  > git config core.sparsecheckout # timeout=10
>  > git checkout -f 711c68092fd771c3f9be4a5d0dd0ecf077f1aeab
>  > git rev-list 8042d52fcb377922a11b9cc5f548690da83a2b1c # timeout=10
> Cleaning workspace
>  > git rev-parse --verify HEAD # timeout=10
> Resetting working tree
>  > git reset --hard # timeout=10
>  > git clean -fdx # timeout=10
> [EnvInject] - Executing scripts and injecting environment variables after
> the SCM step.
> [EnvInject] - Injecting as environment variables the properties content
> SPARK_LOCAL_IP=127.0.0.1
>
> [EnvInject] - Variables injected successfully.
> ERROR: no Job DSL script(s) found at .jenkins/job_*.groovy
>
>


Re: Jenkins precommit worker affinity

2016-12-01 Thread Davor Bonaci
Configuration fixed. The problem was job parallelism, not affinity.

Jason, can you fix this in your pending PR?

On Wed, Nov 30, 2016 at 11:57 PM, Kenneth Knowles 
wrote:

> It appears that the new job beam_PreCommit_Java_MavenInstall has an
> affinity for Jenkins worker beam3 while workers beam1 and beam2 sit idle.
> Is this intentional? There seems to be a backlog of half a dozen builds.
>


Re: Jenkins build became unstable: beam_PostCommit_MavenVerify #1906

2016-11-26 Thread Davor Bonaci
https://issues.apache.org/jira/browse/BEAM-754

On Sat, Nov 26, 2016 at 5:06 AM, Amit Sela  wrote:

> Following build #1907 succeeded. Probably just a flake. I'll followup.
>
> On Sat, Nov 26, 2016, 13:46 Amit Sela  wrote:
>
> > Seems to fail on DataflowRunner "WordCountIT.testE2EWordCount".
> > *Error*:
> > *Expected: Expected checksum is (508517575eba8d8d5a54f7f0080a00
> 951cfe84ca)*
> > * but: was (cfdcdcec05fc8424abc168bf5b0c0ed66e376547)*
> >
> > Anyone with access (and knowledge) to the Dataflow runner could take a
> > look ? Thanks!
> >
> > -- Forwarded message -
> > From: Apache Jenkins Server 
> > Date: Sat, Nov 26, 2016 at 1:35 PM
> > Subject: Jenkins build became unstable: beam_PostCommit_MavenVerify #1906
> > To: , 
> >
> >
> > See <
> > https://builds.apache.org/job/beam_PostCommit_MavenVerify/1906/changes>
> >
> >
>


Re: Build failed in Jenkins: beam_Release_NightlySnapshot #242

2016-11-23 Thread Davor Bonaci
The dependency analysis seem to have failed for the Apex runner:

[INFO] --- maven-dependency-plugin:2.10:analyze-only (default) @
beam-runners-apex ---
[WARNING] Used undeclared dependencies found:
[WARNING]commons-io:commons-io:jar:2.4:compile
[WARNING]com.datatorrent:netlet:jar:1.3.0:compile
[WARNING]org.apache.hadoop:hadoop-common:jar:2.6.0:compile

Post-commit seems good, no changes in the Apex runner since it last passed,
and dependency analysis is known to be flaky. Unless somebody has an idea
how to de-flake this, I think we can ignore.

On Tue, Nov 22, 2016 at 11:24 PM, Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> See  NightlySnapshot/242/changes>
>
> Changes:
>
> [klk] Add JUnit category for stateful ParDo tests
>
> [klk] Reject stateful DoFn in ApexRunner
>
> [klk] Add JUnit category for stateful ParDo tests
>
> [klk] Reject stateful DoFn in SparkRunner
>
> [davor] Beam archetypes: enable snapshot repositories.
>
> [lcwik] [BEAM-59] Drops public constructors and uses Factory methods in
>
> [lcwik] [BEAM-59] Create IOChannelFactoryRegistrar interface and its
> gcs/file
>
> [lcwik] [BEAM-59] Use ServiceLoader to register IOChannelFactories in
>
> [tgroh] Update StarterPipeline
>
> [klk] Reject stateful DoFn in FlinkRunner
>
> [tgroh] Simplify the API for managing MetricsEnvironment
>
> [klk] Output Keyed Bundles in GroupAlsoByWindowEvaluator
>
> [klk] Add TransformHierarchyTest
>
> --
> [...truncated 8129 lines...]
> Generating  NightlySnapshot/ws/runners/apex/target/apidocs/org/
> apache/beam/runners/apex/translation/operators/package-use.html...>
> Generating  NightlySnapshot/ws/runners/apex/target/apidocs/org/
> apache/beam/runners/apex/translation/utils/package-use.html...>
> Building index for all the packages and classes...
> Generating  NightlySnapshot/ws/runners/apex/target/apidocs/overview-tree.html...>
> Generating  NightlySnapshot/ws/runners/apex/target/apidocs/index-all.html...>
> Generating  NightlySnapshot/ws/runners/apex/target/apidocs/deprecated-list.html...>
> Building index for all classes...
> Generating  NightlySnapshot/ws/runners/apex/target/apidocs/allclasses-frame.html...>
> Generating  NightlySnapshot/ws/runners/apex/target/apidocs/allclasses-noframe.html...>
> Generating  NightlySnapshot/ws/runners/apex/target/apidocs/index.html...>
> Generating  NightlySnapshot/ws/runners/apex/target/apidocs/overview-summary.html...>
> Generating  NightlySnapshot/ws/runners/apex/target/apidocs/help-doc.html...>
> [INFO] Building jar:  NightlySnapshot/ws/runners/apex/target/beam-runners-apex-
> 0.4.0-incubating-SNAPSHOT-javadoc.jar>
> [INFO]
> [INFO] --- maven-source-plugin:2.4:jar-no-fork (attach-sources) @
> beam-runners-apex ---
> [INFO] Building jar:  NightlySnapshot/ws/runners/apex/target/beam-runners-apex-
> 0.4.0-incubating-SNAPSHOT-sources.jar>
> [INFO]
> [INFO] --- maven-source-plugin:2.4:test-jar-no-fork (attach-test-sources)
> @ beam-runners-apex ---
> [INFO] Building jar:  NightlySnapshot/ws/runners/apex/target/beam-runners-apex-
> 0.4.0-incubating-SNAPSHOT-test-sources.jar>
> [INFO]
> [INFO] --- maven-jar-plugin:2.5:test-jar (default-test-jar) @
> beam-runners-apex ---
> [INFO] Building jar:  NightlySnapshot/ws/runners/apex/target/beam-runners-apex-
> 0.4.0-incubating-SNAPSHOT-tests.jar>
> [INFO]
> [INFO] --- maven-surefire-plugin:2.19.1:test (runnable-on-service-tests)
> @ beam-runners-apex ---
> [INFO] Tests are skipped.
> [INFO]
> [INFO] --- maven-dependency-plugin:2.10:analyze-only (default) @
> beam-runners-apex ---
> [WARNING] Used undeclared dependencies found:
> [WARNING]commons-io:commons-io:jar:2.4:compile
> [WARNING]com.datatorrent:netlet:jar:1.3.0:compile
> [WARNING]org.apache.hadoop:hadoop-common:jar:2.6.0:compile
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO]
> [INFO] Apache Beam :: Parent .. SUCCESS [
> 17.003 s]
> [INFO] Apache Beam :: SDKs :: Java :: Build Tools . SUCCESS [
> 7.625 s]
> [INFO] Apache Beam :: SDKs  SUCCESS [
> 8.848 s]
> [INFO] Apache Beam :: SDKs :: Java  SUCCESS [
> 4.922 s]
> [INFO] Apache Beam :: SDKs :: Java :: Core ..

Re: [DISCUSS] Graduation to a top-level project

2016-11-23 Thread Davor Bonaci
Thanks everyone for the enthusiastic support!

Please keep the thread going, as we kick off the process on private@.
Please don’t forget to bring up any data points that might help strengthen
our case.

Thanks!

On Wed, Nov 23, 2016 at 8:45 AM, Scott Wegner 
wrote:

> +1 (beaming)
>
> On Wed, Nov 23, 2016 at 8:25 AM Robert Bradshaw
> 
> wrote:
>
> +1
>
> On Wed, Nov 23, 2016 at 7:36 AM, Lukasz Cwik 
> wrote:
> > +1
> >
> > On Wed, Nov 23, 2016 at 9:48 AM, Stephan Ewen  wrote:
> >
> >> +1
> >> The community if doing very well and behaving very Apache
> >>
> >> On Wed, Nov 23, 2016 at 9:54 AM, Etienne Chauchot 
> >> wrote:
> >>
> >> > A big +1 of course, very excited to go forward
> >> >
> >> > Etienne
> >> >
> >> >
> >> >
> >> > Le 22/11/2016 à 19:19, Davor Bonaci a écrit :
> >> >
> >> >> Hi everyone,
> >> >> With all the progress we’ve had recently in Apache Beam, I think it
> is
> >> >> time
> >> >> we start the discussion about graduation as a new top-level project
> at
> >> the
> >> >> Apache Software Foundation.
> >> >>
> >> >> Graduation means we are a self-sustaining and self-governing
> community,
> >> >> and
> >> >> ready to be a full participant in the Apache Software Foundation. It
> >> does
> >> >> not imply that our community growth is complete or that a particular
> >> level
> >> >> of technical maturity has been reached, rather that we are on a solid
> >> >> trajectory in those areas. After graduation, we will still
> periodically
> >> >> report to, and be overseen by, the ASF Board to ensure continued
> growth
> >> of
> >> >> a healthy community.
> >> >>
> >> >> Graduation is an important milestone for the project. It is also key
> to
> >> >> further grow the user community: many users (incorrectly) see
> incubation
> >> >> as
> >> >> a sign of instability and are much less likely to consider us for a
> >> >> production use.
> >> >>
> >> >> A way to think about graduation readiness is through the Apache
> Maturity
> >> >> Model [1]. I think we clearly satisfy all the requirements [2]. It is
> >> >> probably worth emphasizing the recent community growth: over each of
> the
> >> >> past three months, no single organization contributing to Beam has
> had
> >> >> more
> >> >> than ~50% of the unique contributors per month [2, see assumptions].
> >> >> That’s
> >> >> a great statistic that shows how much we’ve grown our diversity!
> >> >>
> >> >> Process-wise, graduation consists of drafting a board resolution,
> which
> >> >> needs to identify the full Project Management Committee, and getting
> it
> >> >> approved by the community, the Incubator, and the Board. Within the
> Beam
> >> >> community, most of these discussions and votes have to be on the
> >> private@
> >> >> mailing list, but, as usual, we’ll try to keep dev@ updated as much
> as
> >> >> possible.
> >> >>
> >> >> With that in mind, let’s use this discussion on dev@ for two things:
> >> >> * Collect additional data points on our progress that we may want to
> >> >> present to the Incubator as a part of the proposal to accept our
> >> >> graduation.
> >> >> * Determine whether the community supports graduation. Please reply
> >> +1/-1
> >> >> with any additional comments, as appropriate. I’d encourage everyone
> to
> >> >> participate -- regardless whether you are an occasional visitor or
> have
> >> a
> >> >> specific role in the project -- we’d love to hear your perspective.
> >> >>
> >> >> Data points so far:
> >> >> * Project’s maturity self-assessment [2].
> >> >> * 1500 pull requests in incubation, which makes us one of the most
> >> active
> >> >> project across all of ASF on this metric.
> >> >> * 3 releases, each driven by a different release manager.
> >> >> * 120+ individual contributors.
> >> >> * 3 new committers added, 2 of which aren’t from the largest
> >> organization.
> >> >> * 1027 issues created, 515 resolved.
> >> >> * 442 dev@ emails in October alone, sent by 51 individuals.
> >> >> * 50 user@ emails in the last 30 days, sent by 22 individuals.
> >> >>
> >> >> Thanks!
> >> >>
> >> >> Davor
> >> >>
> >> >> [1] http://community.apache.org/apache-way/apache-project-
> >> >> maturity-model.html
> >> >> [2] http://beam.incubator.apache.org/contribute/maturity-model/
> >> >>
> >> >>
> >> >
> >>
>


[DISCUSS] Graduation to a top-level project

2016-11-22 Thread Davor Bonaci
Hi everyone,
With all the progress we’ve had recently in Apache Beam, I think it is time
we start the discussion about graduation as a new top-level project at the
Apache Software Foundation.

Graduation means we are a self-sustaining and self-governing community, and
ready to be a full participant in the Apache Software Foundation. It does
not imply that our community growth is complete or that a particular level
of technical maturity has been reached, rather that we are on a solid
trajectory in those areas. After graduation, we will still periodically
report to, and be overseen by, the ASF Board to ensure continued growth of
a healthy community.

Graduation is an important milestone for the project. It is also key to
further grow the user community: many users (incorrectly) see incubation as
a sign of instability and are much less likely to consider us for a
production use.

A way to think about graduation readiness is through the Apache Maturity
Model [1]. I think we clearly satisfy all the requirements [2]. It is
probably worth emphasizing the recent community growth: over each of the
past three months, no single organization contributing to Beam has had more
than ~50% of the unique contributors per month [2, see assumptions]. That’s
a great statistic that shows how much we’ve grown our diversity!

Process-wise, graduation consists of drafting a board resolution, which
needs to identify the full Project Management Committee, and getting it
approved by the community, the Incubator, and the Board. Within the Beam
community, most of these discussions and votes have to be on the private@
mailing list, but, as usual, we’ll try to keep dev@ updated as much as
possible.

With that in mind, let’s use this discussion on dev@ for two things:
* Collect additional data points on our progress that we may want to
present to the Incubator as a part of the proposal to accept our graduation.
* Determine whether the community supports graduation. Please reply +1/-1
with any additional comments, as appropriate. I’d encourage everyone to
participate -- regardless whether you are an occasional visitor or have a
specific role in the project -- we’d love to hear your perspective.

Data points so far:
* Project’s maturity self-assessment [2].
* 1500 pull requests in incubation, which makes us one of the most active
project across all of ASF on this metric.
* 3 releases, each driven by a different release manager.
* 120+ individual contributors.
* 3 new committers added, 2 of which aren’t from the largest organization.
* 1027 issues created, 515 resolved.
* 442 dev@ emails in October alone, sent by 51 individuals.
* 50 user@ emails in the last 30 days, sent by 22 individuals.

Thanks!

Davor

[1] http://community.apache.org/apache-way/apache-project-
maturity-model.html
[2] http://beam.incubator.apache.org/contribute/maturity-model/


Re: Including Apex runner in Beam tutorial at Strata - Singapore

2016-11-15 Thread Davor Bonaci
Hi Sandeep,
It would be great to include the Apex runner as a part of any tutorial
going forward. I suspect we'll have the 0.4.0-incubating release completed
just before Strata Singapore, which will the first release with the Apex
runner, so that aligns quite nicely.

Are you planning to attend Strata Singapore? If so, I'd encourage you to
reach out to Tyler Akidau offline, who's leading the tutorial on this
conference.

Davor

On Tue, Nov 15, 2016 at 7:04 AM, Jean-Baptiste Onofré 
wrote:

> Hi Sandeep,
>
> Great news !
>
> Yes, you can definitely do a demo using the Apex runner. It's what Dan and
> I are also planning during ApacheCon this week: same Wordcount example
> running on different execution engines.
>
> Maybe this blog could help you to prepare the demo:
> http://blog.nanthrax.net/2016/08/apache-beam-in-action-same-
> code-several-execution-engines/
>
> By the way, I will propose a PR to "merge" those blog to Beam website.
>
> Regards
> JB
>
>
> On 11/15/2016 04:00 PM, Sandeep Deshmukh wrote:
>
>> Dear Beam Community,
>>
>> There is a Beam tutorial in Strata-Singapore. I would like to explore
>> possibility of including the Apex runner as a part of that tutorial. As
>> Apex runner is recently merged into master branch of Beam, it would be of
>> interest to many people.
>>
>> Please let us know if we can do so. I can accordingly work on the same.
>>
>> Regards,
>> Sandeep
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Configuring Jenkins

2016-11-15 Thread Davor Bonaci
Hi everybody,
As I'm sure everybody knows, we use Apache's Jenkins instance for all our
testing, including pre-commit, post-commit, nightly snapshot, etc. (Travis
CI is a backup system and recommended for individual forks only.)

Managing Jenkins projects has been a big pain point so far. Among other
reasons, only a few of us have access to configure it, way too few of us
have visibility into what those jobs do, and nobody has any visibility into
changes being made or an opportunity to comment on them.

Well, not any more! I was playing a little bit with Jenkins DSL plugin and
was able to move our configuration out of Jenkins and into the git
repository. I've done it as a proof of concept for the website repository
only [1], but Jason is planning on extending that work to the main
repository. Look for a PR shortly!

Going forward, anyone can see what our Jenkins jobs are doing, and anyone
can add new jobs or improve existing ones by simply proposing a pull
request to change the configuration. Finally, the project maintains a
history in source repository, instead of direct changes without much
accountability.

How this works? There's a "seed" job that periodically applies
configuration specified in the source repository into Jenkins. Currently,
this happens once per day. If you modify the configuration in the source
repository, it will be applied within 24 hours. If you, however, modify the
configuration in Jenkins directly, it will revert back to whatever is
specified in the code repository also within 24 hours.

How to understand Jenkins DSL? There are many resources available; I've
found Jenkins Job DSL API [2] particularly helpful.

I hope you are excited to have this feature available to us! If you have
any thoughts on improving this further, please comment. Thanks!

Davor

[1] https://github.com/apache/incubator-beam-site/pull/80
[2] https://jenkinsci.github.io/job-dsl-plugin/


Re: [PROPOSAL] Change to KafkaIO splits

2016-11-13 Thread Davor Bonaci
Luke is bringing up great questions, I think.

My first impression is that the current state is "possibly under-split",
and the proposal is to move us to "possibly over-split" state. Neither is
the ideal solution, as I'm sure we can find scenarios when either is not
performing well. That said, if we aren't really solving the problem (at
this time), I can believe that "over-split" is better than "under-split".

(Job update is not a Beam consideration at this time, so none of that
applies.)

Davor

On Fri, Nov 11, 2016 at 7:55 AM, Lukasz Cwik 
wrote:

> Why is it that we don't generate initial splits after the pipeline has been
> created and the runner is processing it?
>
> This would allow a runner to look at the old state of the pipeline and see
> how many splits there were.
> This would allow the runner to provide a hint as to how many splits it
> wants.
> This brings it inline with how bounded sources work where the splitting is
> performed once the pipeline has started.
>
> On Fri, Nov 11, 2016 at 8:09 AM, Amit Sela  wrote:
>
> > +1
> > I think this makes more sense then the existing form of a split that is
> > made of several Kafka partitions since, as mentioned, Kafka partitions
> are
> > in fact it's parallelism.
> >
> > As for supporting a change in the number of partitions (mainly, added
> > partitions), I'll suggest something I brought up before, and might make
> > more sense now:
> > Hashing an UnboundedSource according to it's split's properties
> > (topic-partition in this case). This will allow to key the stream by the
> > source in a way that the reader's CheckpointMark is tied to the split,
> and
> > if a "new split" is created (a new partition added to a topic the
> pipeline
> > consumes) it's reader's state is non-existing (starting from
> > latest/earlies), while the rest (of the readers) will pick-up where they
> > left.
> > I think this also avoids the need to "remember" the original number of
> > parallelism.
> >
> > Thanks,
> > Amit
> >
> > On Fri, Nov 11, 2016 at 4:22 AM Raghu Angadi  >
> > wrote:
> >
> > > I would like to propose a change to how many splits (sources) KafkaIO
> > > creates. The code changes are relatively simple, but it has a couple of
> > > drawbacks I would to discuss here.
> > >
> > > KafkaIO currently takes '*desiredNumWorkers
> > > <
> > > https://github.com/apache/incubator-beam/blob/v0.3.0-
> > incubating-RC1/sdks/java/io/kafka/src/main/java/org/
> > apache/beam/sdk/io/kafka/KafkaIO.java#L642
> > > >*'
> > > hint literally and returns exactly that many splits. If
> > *desiredNumWorkers*
> > > is 10, and the topic has 50 partitions, each Kafka source reads from 5
> > > partitions.
> > >
> > > The primary disadvantage is that runner dependent 'desiredNumWorkers'
> > might
> > > not be accurate. In Dataflow, it is particularly low when we set
> > > 'maxNumWorkers' (BEAM-958  > jira/browse/BEAM-958
> > > >).
> > > In addition, number of partitions in Kafka is a really good indicator
> of
> > > its parallelism.
> > >
> > > I would like to change KafkaIO to return one split for each of the
> > > partitions.
> > >
> > > Pros:
> > >
> > >- A partition is in fact the unit of parallelism in Kafka.
> > >- Does not depend on 'desiredNumWorkers'.
> > >- Little risk of having unreasonably large number of partitions
> > (unlike
> > >say a source with one split for file). Number of partitions tend to
> be
> > > on
> > >the order of the Kafka cluster size.
> > >
> > > Cons: mainly affects job update:
> > >
> > >- Breaks updating existing job
> > >
> if
> > it
> > >is updated to newer version of KafkaIO. New version changes number
> of
> > >splits returned, which is not allowed during update.
> > >   - I think this is a reasonable breakage at this stage.
> > >   - Vast majority of updates don't involve version change
> > >   - We could add a work around where user can explicitly set number
> > of
> > >   splits in KafkaIO (this might be required to handle change in
> > > partitions as
> > >   well, see below)
> > >- Makes it a bit more difficult to support change in number of Kafka
> > >partitions across an update.
> > >   - This is not a feature in KafkaIO yet. So not a new breakage.
> > >   - If we don't depend on 'desiredNumWorkers', there is no way for
> us
> > >   to know how many splits we had before the update. This is
> actually
> > a
> > >   limitation of UnboundedSource API. UnboundedSource needs
> > > multiple teaks to
> > >   support job update better. In that sense I don't think this
> should
> > > be a
> > >   blocker.
> > >   - A work around is to let user explicitly set number of splits.
> > E.g.
> > >  - when a job starts, say we had 70 partitions and after some
> > time
> > >  we add 10 more partitions.
> > >  - At runtime, each Kafka split notices these

Re: Introduction + contributing to docs

2016-11-11 Thread Davor Bonaci
Welcome!

On Fri, Nov 11, 2016 at 11:11 AM, Melissa Pashniak <
meliss...@google.com.invalid> wrote:

> Hello!
>
>
> My name is Melissa. I’ve previously been involved with Dataflow
> documentation, and I’m excited to start contributing to the Beam project
> and documentation.
>
>
> I’ve written up some text for Beam’s direct runner and Cloud Dataflow
> runner pages, currently available in pull requests [1][2]. I am also
> working on the unfinished parts of the programming guide [3]. Let me know
> if you have any thoughts or feedback.
>
> I look forward to working with everyone in the community!
>
> Melissa
>
>
> [1] https://github.com/apache/incubator-beam-site/pull/76
> [2] https://github.com/apache/incubator-beam-site/pull/77
> [3] https://issues.apache.org/jira/browse/BEAM-193
>


Re: [PROPOSAL] Merge apex-runner to master branch

2016-11-08 Thread Davor Bonaci
+1

I'd treat this as an official vote on this procedural matter.

On Tue, Nov 8, 2016 at 6:55 AM, Mukul Jain  wrote:

> +1
>
> Awesome work Thomas! More runner choices the better.
>
> Best
> Mukul
>
> Sent from my iPhone
>
> > On Nov 8, 2016, at 6:09 AM, Jean-Baptiste Onofré 
> wrote:
> >
> > +1
> >
> > Great work Thomas !!
> >
> > Regards
> > JB
> >
> > ⁣​
> >
> >> On Nov 8, 2016, 14:54, at 14:54, Thomas Weise  wrote:
> >> Hi,
> >>
> >> As per previous discussion [1], I would like to propose to merge the
> >> apex-runner branch into master. The runner satisfies the criteria
> >> outlined
> >> in [2] and merging it to master will give more visibility to other
> >> contributors and users.
> >>
> >> Specifically the Apex runner addresses:
> >>
> >>  - Have at least 2 contributors interested in maintaining it, and 1
> >> committer interested in supporting it:  *I'm going to sign up for the
> >> support and there are more folks interested. Some have already
> >> contributed
> >> and helped with PR reviews, others from the Apex community have
> >> expressed
> >>  interest [3].*
> >> - Provide both end-user and developer-facing documentation:  *Runner
> >> has
> >> README, capability matrix, Javadoc. Planning to add it to the tutorial
> >>  later.*
> >>  - Have at least a basic level of unit test coverage:  *Has 30 runner
> >>  specific tests and passes all Beam RunnableOnService tests.*
> >>  - Run all existing applicable integration tests with other Beam
> >> components and create additional tests as appropriate: * Enabled runner
> >>  for examples integration tests in the same way as other runners.*
> >> - Be able to handle a subset of the model that address a significant
> >> set of
> >>  use cases (aka. ‘traditional batch’ or ‘processing time
> >> streaming’):  *Passes
> >>  RunnableOnService without exclusions and example IT.*
> >>  - Update the capability matrix with the current status:  *Done.*
> >> - Add a webpage under learn/runners: *Same "TODO" page as other runners
> >>  added to site.*
> >>
> >> The PR for the merge:
> >> https://github.com/apache/incubator-beam/pull/1305
> >>
> >> (There are intermittent test failures in individual Travis runs that
> >> are
> >> unrelated to the runner.)
> >>
> >> Thanks,
> >> Thomas
> >>
> >> [1]
> >> https://lists.apache.org/thread.html/2b420a35f05e47561f27c19e8ec648
> 4f595553f32da88fe593ad931d@%3Cdev.beam.apache.org%3E
> >>
> >> [2]
> >> http://beam.apache.org/contribute/contribution-guide/#feature-branches
> >>
> >> [3]
> >> https://lists.apache.org/thread.html/6e7618768cdcde81c28aa9883a1fcf
> 4d3d4e41de4249547
> >>  4d3d4e41de4249547130691d52@%3Cdev.apex.apache.org%3E>
> >> 130691d52@%3Cdev.apex.apache.org%3E
> >>  4d3d4e41de4249547130691d52@%3Cdev.apex.apache.org%3E>
>


Re: Contributing to Beam docs

2016-11-04 Thread Davor Bonaci
>
> From the URL Hadar shared, I believe we are using GCS buckets to host the
> content. https://cloud.google.com/storage/docs/hosting-static-website has
> information about hosting static websites. Have we looked at that before?
> There's a section of that titled "Optional: Assigning pages" which has more
> information about editing website configuration. (you may need to change
> some config settings so that GCS knows it's a website before it'll give you
> those options)
>
> > p.s. I'd love to contribute to solving the /index.html thing. Seems like
> > something we should be able to engineer our way around.
>

I have experimented with automatic staging of website pull requests to
simplify reviews -- it is a work-in-progress due to the "/index.html thing".

In the current setup, we'd need a (sub-)domain with CNAME and a TXT entries
to solve the problem. I've floated the idea with Infra -- they weren't
enthusiastic about doing this as a subdomain of beam.incubator.apache.org.
Unless we choose to obtain another domain elsewhere, the only option I'm
aware of would be to try to "sed" the jekyll's output to fix the links --
somewhat fragile and non-great, but possible.


Re: [ANNOUNCE] Beam 0.3.0-incubating Released

2016-10-31 Thread Davor Bonaci
Fantastic! Thanks Aljoscha.

On Mon, Oct 31, 2016 at 9:37 AM, Dan Halperin 
wrote:

> Wow! This is awesome, thanks Aljoscha. And congrats on the first release
> where RC1 went out successfully ;)
>
> Dan
>
> On Mon, Oct 31, 2016 at 9:36 AM, Aljoscha Krettek 
> wrote:
>
> > Congratulations, team! I just finalised everything for the most recent
> > release. The artefacts are on Maven, the website is updated and the
> source
> > release should slowly propagate through the Apache servers.
> >
> > I'll also send an email to the user list to highlight some of the new
> > features.
> >
> > Cheers,
> > Aljoscha
> >
>


Re: [PROPOSAL] New Beam website design?

2016-10-27 Thread Davor Bonaci
The best place to learn how to get started is the Contribution Guide [1].
The list of pending JIRA issues related to the website is also available
[2].

I think BEAM-752 would be the best to get your feet wet. Other good
candidates are 516, 268, 776. If someone knows a good (non-fragile)
solution to 751, that would be a great contribution!

Davor

[1] http://beam.incubator.apache.org/contribute/contribution-guide/
[2]
https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20%3D%20website

On Thu, Oct 27, 2016 at 5:20 AM, Jean-Baptiste Onofré 
wrote:

> Great !! Thanks.
>
> You can take a look on BEAM-500 and 501 and also the PR I did last week.
>
> I plan to submit new PRs during the week end. So please let me know how we
> can sync.
>
> Thanks
> Regards
> JB
>
> On Oct 27, 2016, at 14:04, Minudika Malshan  wrote:
>>
>> Hi all,
>>
>> I would like to join for the development of the new site.
>> Is there any issue tracking method for this? (Are there any jirra issues)
>>
>> Thank you!
>>
>>
>>
>> On Thu, Oct 27, 2016 at 4:01 PM, Jean-Baptiste Onofré 
>> wrote:
>>
>>
>>
>>>Hi
>>>
>>>
>>>
>>>  You can propose a PR on this Jira.
>>>
>>>
>>>
>>>  We will be more than happy to review it.
>>>
>>>
>>>
>>>  Thanks
>>>
>>>  Regards
>>>
>>>  JB
>>>
>>>
>>>
>>>  ⁣​
>>>
>>>
>>>
>>>  On Oct 27, 2016, 11:26, at 11:26, Abdullah Bashir 
>>>
>>>  wrote:
>>>
>>>
>>>
>>>>Thank you very much for taking time to respond Davor :)
>>>>
>>>>
>>>>
>>>> Regarding BEAM-752, i can work on that, i have already built some
>>>>
>>>> Dataflow
>>>>
>>>> Piplines on Google Cloud in Python language.
>>>>
>>>>
>>>>
>>>> Again Can you tell me where to start for BEAM-752. I am new to ASF
>>>>
>>>> contribution, so onboarding steps are kind of a black box to me :).
>>>>
>>>>
>>>>
>>>> On Thu, Oct 27, 2016 at 11:34 AM, Davor Bonaci 
>>>>
>>>> wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>  Absolutely!
>>>>>
>>>>>
>>>>>
>>>>>  I'm currently reviewing JB's PR #51, and that should go in shortly.
>>>>>
>>>>>
>>>>>Within
>>>>
>>>>
>>>>
>>>>>  a day or so, I should have a better idea about future work in this
>>>>>
>>>>>
>>>>>specific
>>>>
>>>>
>>>>
>>>>>  area; please stay tuned.
>>>>>
>>>>>
>>>>>
>>>>>  There are also separate things that are ready to be started at any
>>>>>
>>>>>
>>>>>    time.
>>>>
>>>>
>>>>
>>>>>  BEAM-752 comes to mind first. Is this something you'd be interested
>>>>>
>>>>>
>>>>>in?
>>>>
>>>>
>>>>
>>>>>
>>>>>  On Wed, Oct 26, 2016 at 11:17 PM, Abdullah Bashir
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>>  wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>   Hi Davor,
>>>>>>
>>>>>>
>>>>>>
>>>>>>  I am done with my local setup to start contributing, I have forked
>>>>>>
>>>>>>
>>>>>>
>>>>>and
>>>>
>>>>
>>>>
>>>>>
>>>>>>   merged pull request *(**pull/51)* into my local repo. Then I read
>>>>>>
>>>>>>
>>>>>>
>>>>>the
>>>>
>>>>
>>>>
>>>>>
>>>>>>
>>>>>>  google docs, their are two tasks mentioned in it, as [Beam-500] and
>>>>>>
>>>>>>  [Beam-501].
>&g

Re: Can we have more quick start examples ?

2016-10-27 Thread Davor Bonaci
Indeed -- this is a clear area for improvement. Sources are usually not as
big of an issue -- these resources are publicly accessible regardless
where/how you run the pipeline (locally, or with any runner). On the other
hand, Sinks require write access, which is often more problematic.

One correction, however: WordCount supports both GCS and local paths, with
some exceptions depending on a runner.

There are several efforts to improve this, most notably BEAM-59, which is
assigned to Pei.

On Thu, Oct 27, 2016 at 8:17 AM, Jesse Anderson 
wrote:

> Those tutorials help. I was going through the example code and had the same
> thought. We need to take a pass through the examples and remove some of the
> Google Cloud dependencies.
>
> On Thu, Oct 27, 2016, 5:13 PM Thomas Weise  wrote:
>
> > The Beam tutorials seem to address this:
> >
> > https://github.com/eljefe6a/beamexample/blob/master/README.md
> >
> >
> > On Thu, Oct 27, 2016 at 8:04 AM, Manu Zhang 
> > wrote:
> >
> > > Hey guys,
> > >
> > > I find Beam examples under the examples folder are not easy to run due
> to
> > > dependency on Google specific services. Even the MinimalWordCount
> > >  > >
> > examples/java/src/main/java/org/apache/beam/examples/
> MinimalWordCount.java
> > > >
> > > requires
> > > input and output to be on Google Cloud Storage. Others like
> > > WindowedWordCount
> > >  > > examples/java/src/main/java/org/apache/beam/examples/
> > > WindowedWordCount.java>
> > > require
> > > BigQuery.  I wouldn't expect newcomers to tweak IO themselves.
> > >
> > > Can we have more quick start examples that can be run anywhere ?
> > >
> > > Thanks,
> > > Manu Zhang
> > >
> >
>


Re: [PROPOSAL] New Beam website design?

2016-10-26 Thread Davor Bonaci
Absolutely!

I'm currently reviewing JB's PR #51, and that should go in shortly. Within
a day or so, I should have a better idea about future work in this specific
area; please stay tuned.

There are also separate things that are ready to be started at any time.
BEAM-752 comes to mind first. Is this something you'd be interested in?

On Wed, Oct 26, 2016 at 11:17 PM, Abdullah Bashir 
wrote:

> Hi Davor,
>
> I am done with my local setup to start contributing, I have forked and
> merged pull request *(**pull/51)* into my  local repo. Then I read the
> google docs, their are two tasks mentioned in it, as [Beam-500] and
> [Beam-501].
> I found out that [Beam-500] is closed in JIRA and [Beam-501] is
> assigned to Jean-Baptiste
> Onofré, Is their any task that you can assign to me ?
>
> Thanks.
>
> Regards,
> Abdullah Bashir
>
>
> On Tue, Oct 25, 2016 at 1:50 AM, Davor Bonaci  wrote:
>
> > Abdullah, welcome!
> >
> > I think it's rather clear we've been struggling with the website, so any
> > help is very welcome. It is a little bit messy right now -- there are a
> few
> > outstanding pull requests and forked branches. I'm trying to get all this
> > into one place, so anybody can contribute and make progress.
> >
> > Also, the general website organization has been discussed before, see
> this
> > thread [1] and the attached document for details.
> >
> > Davor
> >
> > [1]
> > https://mail-archives.apache.org/mod_mbox/beam-dev/201606.
> > mbox/%3CCAAzyFAwu992x+xcxN6Ha-avKZZbF-RK00mUg1-vezYCmtOm4Ww@
> > mail.gmail.com%3E
> >
> > On Sun, Oct 23, 2016 at 12:34 AM, Jean-Baptiste Onofré 
> > wrote:
> >
> > > Hi
> > >
> > > You can take a look on the PR I creates last Friday. It contains a
> > > CSS/skin proposal.
> > >
> > > The mock-up is there: http://maven.nanthrax.net/beam
> > >
> > > Regards
> > > JB
> > >
> > > ⁣​
> > >
> > > On Oct 23, 2016, 09:27, at 09:27, Abdullah Bashir <
> > mabdullah...@gmail.com>
> > > wrote:
> > > >Hi,
> > > >
> > > >is their any help i can do on website designing ?
> > > >I am good at HTML5, CSS3 and javascript.
> > > >
> > > >Regards,
> > > >Abdullah Bashir
> > >
> >
>


Re: [DISCUSS] Using Verbs for Transforms

2016-10-26 Thread Davor Bonaci
In terms of reaching a decision on any code or design changes, including
this one, I'd suggest going without formal votes. Voting process for code
modifications between choices A and B doesn't necessarily end with a
decision A or B -- a single (qualified) -1 vote is a veto and cannot be
overridden [1]. Said differently, the guideline is that code changes should
be made by consensus; not by one group outvoting another. I'd like to avoid
setting such precedent; we should try to drive consensus, as opposed to
attempting to outvote another part of the community.

In this particular case, we have had a great discussion. Many contributors
brought different perspectives. Consequently, some opinions have been
likely changed. At this point, someone should summarize the arguments, try
to critique them from a neutral standpoint, and suggest a refined proposal
that takes these perspectives into account. If nobody objects in a short
time, we should consider this decided. [ I can certainly help here, but I'd
love to see somebody else do it! ]

[1] http://www.apache.org/foundation/voting.html

On Wed, Oct 26, 2016 at 7:35 AM, Ben Chambers 
wrote:

> I also like Distinct since it doesn't make it sound like it modifies any
> underlying collection. RemoveDuplicates makes it sound like the duplicates
> are removed, rather than a new PCollection without duplicates being
> returned.
>
> On Wed, Oct 26, 2016, 7:36 AM Jean-Baptiste Onofré 
> wrote:
>
> > Agree. It was more a transition proposal.
> >
> > Regards
> > JB
> >
> > ⁣​
> >
> > On Oct 26, 2016, 08:31, at 08:31, Robert Bradshaw
> >  wrote:
> > >On Mon, Oct 24, 2016 at 11:02 PM, Jean-Baptiste Onofré
> > > wrote:
> > >> And what about use RemoveDuplicates and create an alias Distinct ?
> > >
> > >I'd really like to avoid (long term) aliases--you end up having to
> > >document (and maintain) them both, and it adds confusion as to which
> > >one to use (especially if they every diverge), and means searching for
> > >one or the other yields half the results.
> > >
> > >> It doesn't break the API and would address both SQL users and more
> > >"big data" users.
> > >>
> > >> My $0.01 ;)
> > >>
> > >> Regards
> > >> JB
> > >>
> > >> ⁣
> > >>
> > >> On Oct 24, 2016, 22:23, at 22:23, Dan Halperin
> > > wrote:
> > >>>I find "MakeDistinct" more confusing. My votes in decreasing
> > >>>preference:
> > >>>
> > >>>1. Keep `RemoveDuplicates` name, ensure that important keywords are
> > >in
> > >>>the
> > >>>Javadoc. This reduces churn on our users and is honestly pretty dang
> > >>> descriptive.
> > >>>2. Rename to `Distinct`, which is clear if you're a SQL user and
> > >likely
> > >>>less clear otherwise. This is a backwards-incompatible API change, so
> > >>>we
> > >>>should do it before we go stable.
> > >>>
> > >>>I am not super strong that 1 > 2, but I am very strong that
> > >"Distinct"
> > >>
> > >>>"MakeDistinct" or and "RemoveDuplicates" >>> "AvoidDuplicate".
> > >>>
> > >>>Dan
> > >>>
> > >>>On Mon, Oct 24, 2016 at 10:12 AM, Kenneth Knowles
> > >>>
> > >>>wrote:
> > >>>
> >  The precedent that we use verbs has many exceptions. We have
> >  ApproximateQuantiles, Values, Keys, WithTimestamps, and I would
> > >even
> >  include Sum (at least when I read it).
> > 
> >  Historical note: the predilection towards verbs is from the Google
> > >>>Style
> >  Guide for Java method names
> > 
> > >>> 2.3-method-names
> > >,
> >  which states "Method names are typically verbs or verb phrases".
> > >But
> > >>>even
> >  in Google code there are lots of exceptions when it makes sense,
> > >like
> >  Guava's
> >  Iterables.any(), Iterables.all(), Iterables.toArray(), the entire
> >  Predicates module, etc. Just an aside; Beam isn't Google code. I
> > >>>suggest we
> >  use our judgment rather than a policy.
> > 
> >  I think "Distinct" is one of those exceptions. It is a standard
> > >>>widespread
> >  name and also reads better as an adjective. I prefer it, but also
> > >>>don't
> >  care strongly enough to change it or to change it back :-)
> > 
> >  If we must have a verb, I like it as-is more than MakeDistinct and
> >  AvoidDuplicate.
> > 
> >  On Mon, Oct 24, 2016 at 9:46 AM Jesse Anderson
> > >>>
> >  wrote:
> > 
> >  > My original thought for this change was that Crunch uses the
> > >class
> > >>>name
> >  > Distinct. SQL also uses the keyword distinct.
> >  >
> >  > Maybe the rule should be changed to adjectives or verbs depending
> > >>>on the
> >  > context.
> >  >
> >  > Using a verb to describe this class really doesn't connote what
> > >the
> > >>>class
> >  > does as succinctly as the adjective.
> >  >
> >  > On Mon, Oct 24, 2016 at 9:40 AM Neelesh Salian
> > >>>
> >  > wrote:
> >  >
> >  > > Hello,
> >  > >
> >  > > First of all, thank you to Daniel, Robert and Jes

Re: [DISCUSS] Merging master -> feature branch

2016-10-26 Thread Davor Bonaci
+1

I concur it is fine to proceed with a downstream integration (master ->
feature branch -> sub-feature branch) without waiting for review for a
completely clean merge. Exactly as proposed -- I think there should still
be a pull request and comment saying it is a clean merge. (In some ideal
world, this would happen nightly by a tool automatically, but I think
that's not feasible in the short term.)

I think other cases (upstream integration, merge conflict, any manual
action, etc.) should still wait for a normal review.

On Wed, Oct 26, 2016 at 10:34 AM, Thomas Weise  wrote:

> +1
>
> For a merge from master to the feature branch that does not require extra
> changes, RTC does not add value. It actually delays and burns reviewer time
> (even mechanics need some) that "real" PRs could benefit from. If
> adjustments are needed, then the regular process kicks in.
>
> Thanks,
> Thomas
>
>
> On Wed, Oct 26, 2016 at 1:33 AM, Amit Sela  wrote:
>
> > I generally agree with Kenneth.
> >
> > While working on the SparkRunnerV2 branch, it was a pain - i avoided
> > frequent merges to avoid trivial PRs, but it cost me with very large and
> > non-trivial merges later.
> > I think that frequent merges for feature-branches should most of the time
> > be trivial (no conflicts) and a committer should be allowed to self-merge
> > once tests pass.
> > As for conflicts, even for the smallest once I'd go with review just so
> > it's very clear when self-merging is OK - we can always revisit this
> later
> > and further discuss if we think we can improve this process.
> >
> > I guess +1 from me.
> >
> > Thanks,
> > Amit.
> >
> > On Wed, Oct 26, 2016 at 8:10 AM Frances Perry 
> > wrote:
> >
> > > On Tue, Oct 25, 2016 at 9:44 PM, Jean-Baptiste Onofré  >
> > > wrote:
> > >
> > > > Agree. When possible it would be great to have the branch merged on
> > > master
> > > > quickly, even when it's not fully ready. It would give more
> visibility
> > to
> > > > potential contributors.
> > > >
> > >
> > > This thread is about the opposite, I think -- merging master into
> feature
> > > branches regularly to prevent them from getting out of sync.
> > >
> > > As for increasing the visibility of feature branches, we have these new
> > > webpages:
> > > http://beam.incubator.apache.org/contribute/work-in-progress/
> > > http://beam.incubator.apache.org/contribute/contribution-
> > > guide/#feature-branches
> > > with more changes coming in the basic SDK/Runner landing pages too.
> > >
> >
>


Re: Apex runner status and next steps

2016-10-26 Thread Davor Bonaci
+1.

I have nothing to add -- with those three bullets resolved, I think we can
move forward with the merge to master.

On Wed, Oct 26, 2016 at 10:24 AM, Thomas Weise  wrote:

> Hi,
>
> The Apex runner is currently in a feature branch:
>
> https://github.com/apache/incubator-beam/tree/apex-runner
>
> Focus till here has been on functional completeness. It passes all the
> integration tests.
>
> Apex with its stateful stream processing architecture can support all of
> the concepts in the Beam model (event time, triggers, watermarks etc.).
> Most of these are already supported through the Beam SDK. The glue code
> that had to be written isn't that much, which speaks to the conceptual
> alignment in general.
>
> The runner in its current form does not leverage all the performance and
> scalability that Apex can deliver. We expect to address this with future
> contributions, leveraging things like incremental checkpointing,
> partitioning and operator affinity from Apex.
>
> From a code perspective, the runner should be close to what is needed for a
> merge to master (based on the contribution guidelines). The following items
> have been identified as prerequisite:
>
> * Add a README.md to the runner directory that summarizes its current state
> * Update the https://beam.apache.org/learn/runners/capability-matrix/ to
> include the Apex info
> * Create the page under learn/runners (at least the place holder)
>
> It should also be noted that the integration tests currently take quite
> long to run with embedded Apex (~50 minutes). Some of that has to do with
> how completion of the tests is determined and there are ideas to improve
> it.
>
> I have created some JIRAs from my TODO list of follow-up work for more
> contributors to get involved:
>
> https://issues.apache.org/jira/issues/?jql=project%20%
> 3D%20BEAM%20AND%20component%20%3D%20runner-apex
>
> Some folks on the Apex dev list have expressed interest to take up some of
> this work. And thanks to Ismaël Mejía for BEAM-815
>  !
>
> I'm looking forward to your comments and suggestions.
>
> Thanks,
> Thomas
>


Re: [VOTE] Release 0.3.0-incubating, release candidate #1

2016-10-25 Thread Davor Bonaci
+1 (binding)

My understanding that Kinesis licensing is not an issue since we don't
redistribute that code ourselves. (I'd also be fine in excluding that code
from the source distribution if deemed necessary.)

On Tue, Oct 25, 2016 at 11:45 AM, Dan Halperin 
wrote:

> I can't tell whether it is a problem that we are distributing the
> beam-sdks-java-io-kinesis module [0].
>
> Here is the dev@ discussion thread [1] and the (unanswered) relevant LEGAL
> thread [2].
> We linked through to a Spark-related discussion [3], and here is how to
> disable distribution of the KinesisIO module [4].
>
> [0]
> https://repository.apache.org/content/repositories/staging/
> org/apache/beam/beam-sdks-java-io-kinesis/
> [1]
> https://lists.apache.org/thread.html/6784bc005f329d93fd59d0f8759ed4
> 745e72f105e39d869e094d9645@%3Cdev.beam.apache.org%3E
> [2]
> https://issues.apache.org/jira/browse/LEGAL-198?focusedCommentId=15471529&;
> page=com.atlassian.jira.plugin.system.issuetabpanels:
> comment-tabpanel#comment-15471529
> [3] https://issues.apache.org/jira/browse/SPARK-17418
> [4] https://github.com/apache/spark/pull/15167/files
>
> Dan
>
> On Tue, Oct 25, 2016 at 11:01 AM, Seetharam Venkatesh <
> venkat...@innerzeal.com> wrote:
>
> > +1
> >
> > Thanks!
> >
> > On Mon, Oct 24, 2016 at 2:30 PM Aljoscha Krettek 
> > wrote:
> >
> > > Hi Team!
> > >
> > > Please review and vote at your leisure on release candidate #1 for
> > version
> > > 0.3.0-incubating, as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > > The complete staging area is available for your review, which includes:
> > > * JIRA release notes [1],
> > > * the official Apache source release to be deployed to dist.apache.org
> > > [2],
> > > * all artifacts to be deployed to the Maven Central Repository [3],
> > > * source code tag "v0.3.0-incubating-RC1" [4],
> > > * website pull request listing the release and publishing the API
> > reference
> > > manual [5].
> > >
> > > Please keep in mind that this release is not focused on providing new
> > > functionality. We want to refine the release process and make stable
> > source
> > > and binary artefacts available to our users.
> > >
> > > The vote will be open for at least 72 hours. It is adopted by majority
> > > approval, with at least 3 PPMC affirmative votes.
> > >
> > > Cheers,
> > > Aljoscha
> > >
> > > [1]
> > >
> > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> > projectId=12319527&version=12338051
> > > [2]
> > > https://dist.apache.org/repos/dist/dev/incubator/beam/0.3.0-
> incubating/
> > > [3]
> > > https://repository.apache.org/content/repositories/staging/
> > org/apache/beam/
> > > [4]
> > >
> > > https://git-wip-us.apache.org/repos/asf?p=incubator-beam.git;a=tag;h=
> > 5d86ff7f04862444c266142b0d5acecb5a6b7144
> > > [5] https://github.com/apache/incubator-beam-site/pull/52
> > >
> >
>


Re: Maven Release Plugin Does Not Update Version of Archetypes

2016-10-24 Thread Davor Bonaci
Sounds that the quoted commit is the right behavior; reverting it now would
make the archetype fragile.

I guess we should revert it just before declaring the first stable version,
until then we'll have to do a manual actual during each release.

On Mon, Oct 24, 2016 at 12:56 PM, Lukasz Cwik 
wrote:

> Archetypes using SNAPSHOT makes the development process consistent and has
> us maintain the archetype code as the SDK code changes.
> I feel as though its important to maintain the archetypes in sync with the
> SDK code as we introduce backwards incompatible changes.
> Also, because the SDK is changing, it seems worthwhile to have archetypes
> that are as current as possible for new users.
>
>
> On Mon, Oct 24, 2016 at 12:04 PM, Aljoscha Krettek 
> wrote:
>
> > Hi,
> > to unblock the release I'm changing the version manually now, yes. Would
> > be good to fix though.
> >
> > Cheers,
> > Aljoscha
> >
> > On Mon, 24 Oct 2016 at 20:30 Dan Halperin 
> > wrote:
> >
> >> Hmm, this is new in 0.3.0, looks caused by
> >> https://github.com/apache/incubator-beam/commit/
> >> 1f30255edcdd9c1e445b69248191c8552724f086#diff-
> >> 4795b1d27449c01332aad192348eL111
> >>  >> 2Fapache%2Fincubator-beam%2Fcommit%2F1f30255edcdd9c1e445b69248191
> >> c8552724f086%23diff-4795b1d27449c01332aad19234
> 8eL111&sa=D&sntz=1&usg=
> >> AFQjCNGOYTW7DSiNZuGnOKWuHhggzsnztQ>
> >>
> >> Thinking if we can revert this part of the commit. Pei, Luke -- remember
> >> what's up?
> >>
> >> On Mon, Oct 24, 2016 at 11:17 AM, Dan Halperin 
> >> wrote:
> >>
> >> > Would it unblock the release to manually configure the version in the
> >> > 0.3.0-release branch?
> >> >
> >> > On Mon, Oct 24, 2016 at 11:09 AM, Dan Halperin 
> >> > wrote:
> >> >
> >> >> Correct issue link: https://issues.apache.org/jira/browse/BEAM-806
> >> >>
> >> >> No answers, but looking around.
> >> >>
> >> >> On Mon, Oct 24, 2016 at 10:10 AM, Aljoscha Krettek <
> >> aljos...@apache.org>
> >> >> wrote:
> >> >>
> >> >>> Hi,
> >> >>> are there any Maven mavens who happen to know how
> >> >>> https://issues.apache.org/jira/browse/BEAM-108 can be fixed? By the
> >> way,
> >> >>> the release plugin does also not update the version of the
> archetypes
> >> >>> when
> >> >>> setting the next SNAPSHOT version.
> >> >>>
> >> >>> IMHO, it's a bit of a release blocker so I'm hoping we can get this
> >> >>> sorted
> >> >>> quickly. I did some preliminary research but couldn't find a
> solution
> >> but
> >> >>> if no-one knows how to fix it it seems I have to dig deeper myself.
> >> >>>
> >> >>> Cheers,
> >> >>> Aljoscha
> >> >>>
> >> >>
> >> >>
> >> >
> >>
> >
>


Re: Tracking backward-incompatible changes for Beam

2016-10-24 Thread Davor Bonaci
I don't think we have it right now. We should, of course, but this is
something that needs to be defined/discussed first.

On Mon, Oct 24, 2016 at 1:20 PM, Neelesh Salian 
wrote:

> +1 for the labels and also a need for tests.
> Do we document any rules for backward-compatibility? Be good to have a
> checklist-like list.
>
>
>
>
> On Mon, Oct 24, 2016 at 1:02 PM, Davor Bonaci 
> wrote:
>
> > It would be awesome to have that! At least a good portion of
> > backward-incompatible changes could be automatically caught.
> >
> > We should also think about defining backward-compatibility more
> precisely.
> > This would be good in its own right, but also necessary to configure the
> > tool. Historically, we have applied the backward-compatibility rules on
> > APIs that are intended for users, excluding experimental ones, but not
> > necessarily on all publicly visible APIs. If we continue this practice,
> it
> > might be a challenge for the tool. In any case, I think there's a good
> > discussion to be had around what backward-compatibility means exactly in
> > Beam.
> >
> > On Sat, Oct 22, 2016 at 2:47 AM, Aljoscha Krettek 
> > wrote:
> >
> > > Very good idea!
> > >
> > > Should we already start thinking about automatic tests for backwards
> > > compatibility of the API?
> > >
> > > On Fri, 21 Oct 2016 at 10:56 Jean-Baptiste Onofré 
> > wrote:
> > >
> > > > Hi Dan,
> > > >
> > > > +1, good idea.
> > > >
> > > > Regards
> > > > JB
> > > >
> > > > On 10/21/2016 02:21 AM, Dan Halperin wrote:
> > > > > Hey everyone,
> > > > >
> > > > > In the Beam codebase, we’ve improved, rewritten, or deleted many
> > APIs.
> > > > > While this has improved the model and gives us great freedom to
> > > > experiment,
> > > > > we are also causing churn on users authoring Beam libraries and
> > > > pipelines.
> > > > >
> > > > > To really kick off Beam as something users can depend on, we need
> to
> > > > > stabilize the Beam API. Stabilizing means a commitment to not
> making
> > > > > breaking changes -- except between major versions as per standard
> > > > semantic
> > > > > versioning.
> > > > >
> > > > > To get there, I’ve started a process for tracking these changes by
> > > > applying
> > > > > the `backward-incompatible` label [1] to the corresponding JIRA
> > issues.
> > > > > Naturally, open `backward-incompatible` changes are “blocking
> issues”
> > > for
> > > > > the first stable release. (Or we’ll have to put them off for the
> next
> > > > major
> > > > > version!)
> > > > >
> > > > > So here are some requests for help:
> > > > > * Please review and appropriately label the components I skipped:
> > > > > runner-{apex, flink, gearpump, spark}, sdk-py.
> > > > > * Please proactively file JIRA issues for breaking API changes you
> > > still
> > > > > want to make, and label them.
> > > > >
> > > > > Thanks everyone!
> > > > > Dan
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > > https://issues.apache.org/jira/issues/?jql=project%20%
> > > 3D%20BEAM%20AND%20labels%20%3D%20backward-incompatible
> > > > >
> > > >
> > > > --
> > > > Jean-Baptiste Onofré
> > > > jbono...@apache.org
> > > > http://blog.nanthrax.net
> > > > Talend - http://www.talend.com
> > > >
> > >
> >
>
>
>
> --
> Neelesh Srinivas Salian
> Customer Operations Engineer
>


Re: [PROPOSAL] New Beam website design?

2016-10-24 Thread Davor Bonaci
Abdullah, welcome!

I think it's rather clear we've been struggling with the website, so any
help is very welcome. It is a little bit messy right now -- there are a few
outstanding pull requests and forked branches. I'm trying to get all this
into one place, so anybody can contribute and make progress.

Also, the general website organization has been discussed before, see this
thread [1] and the attached document for details.

Davor

[1]
https://mail-archives.apache.org/mod_mbox/beam-dev/201606.mbox/%3ccaazyfawu992x+xcxn6ha-avkzzbf-rk00mug1-vezycmtom...@mail.gmail.com%3E

On Sun, Oct 23, 2016 at 12:34 AM, Jean-Baptiste Onofré 
wrote:

> Hi
>
> You can take a look on the PR I creates last Friday. It contains a
> CSS/skin proposal.
>
> The mock-up is there: http://maven.nanthrax.net/beam
>
> Regards
> JB
>
> ⁣​
>
> On Oct 23, 2016, 09:27, at 09:27, Abdullah Bashir 
> wrote:
> >Hi,
> >
> >is their any help i can do on website designing ?
> >I am good at HTML5, CSS3 and javascript.
> >
> >Regards,
> >Abdullah Bashir
>


Re: Tracking backward-incompatible changes for Beam

2016-10-24 Thread Davor Bonaci
It would be awesome to have that! At least a good portion of
backward-incompatible changes could be automatically caught.

We should also think about defining backward-compatibility more precisely.
This would be good in its own right, but also necessary to configure the
tool. Historically, we have applied the backward-compatibility rules on
APIs that are intended for users, excluding experimental ones, but not
necessarily on all publicly visible APIs. If we continue this practice, it
might be a challenge for the tool. In any case, I think there's a good
discussion to be had around what backward-compatibility means exactly in
Beam.

On Sat, Oct 22, 2016 at 2:47 AM, Aljoscha Krettek 
wrote:

> Very good idea!
>
> Should we already start thinking about automatic tests for backwards
> compatibility of the API?
>
> On Fri, 21 Oct 2016 at 10:56 Jean-Baptiste Onofré  wrote:
>
> > Hi Dan,
> >
> > +1, good idea.
> >
> > Regards
> > JB
> >
> > On 10/21/2016 02:21 AM, Dan Halperin wrote:
> > > Hey everyone,
> > >
> > > In the Beam codebase, we’ve improved, rewritten, or deleted many APIs.
> > > While this has improved the model and gives us great freedom to
> > experiment,
> > > we are also causing churn on users authoring Beam libraries and
> > pipelines.
> > >
> > > To really kick off Beam as something users can depend on, we need to
> > > stabilize the Beam API. Stabilizing means a commitment to not making
> > > breaking changes -- except between major versions as per standard
> > semantic
> > > versioning.
> > >
> > > To get there, I’ve started a process for tracking these changes by
> > applying
> > > the `backward-incompatible` label [1] to the corresponding JIRA issues.
> > > Naturally, open `backward-incompatible` changes are “blocking issues”
> for
> > > the first stable release. (Or we’ll have to put them off for the next
> > major
> > > version!)
> > >
> > > So here are some requests for help:
> > > * Please review and appropriately label the components I skipped:
> > > runner-{apex, flink, gearpump, spark}, sdk-py.
> > > * Please proactively file JIRA issues for breaking API changes you
> still
> > > want to make, and label them.
> > >
> > > Thanks everyone!
> > > Dan
> > >
> > >
> > > [1]
> > >
> > https://issues.apache.org/jira/issues/?jql=project%20%
> 3D%20BEAM%20AND%20labels%20%3D%20backward-incompatible
> > >
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>


[ANNOUNCEMENT] New committers!

2016-10-21 Thread Davor Bonaci
Hi everyone,
Please join me and the rest of Beam PPMC in welcoming the following
contributors as our newest committers. They have significantly contributed
to the project in different ways, and we look forward to many more
contributions in the future.

* Thomas Weise
Thomas authored the Apache Apex runner for Beam [1]. This is an exciting
new runner that opens a new user base. It is a large contribution, which
starts the whole new component with a great potential.

* Jesse Anderson
Jesse has contributed significantly by promoting Beam. He has co-developed
a Beam tutorial and delivered it at a top big data conference. He published
several blog posts positioning Beam, Q&A with the Apache Beam team, and a
demo video how to run Beam on multiple runners [2]. On the side, he has
authored 7 pull requests and reported 6 JIRA issues.

* Thomas Groh
Since starting incubation, Thomas has contributed the most commits to the
project [3], a total of 226 commits, which is more than anybody else. He
has contributed broadly to the project, most significantly by developing
from scratch the DirectRunner that supports the full model semantics.
Additionally, he has contributed a new set of APIs for testing unbounded
pipelines. He published a blog highlighting this work.

Congratulations to all three! Welcome!

Davor

[1] https://github.com/apache/incubator-beam/tree/apex-runner
[2] http://www.smokinghand.com/
[3] https://github.com/apache/incubator-beam/graphs/contributors
?from=2016-02-01&to=2016-10-14&type=c


Re: [DISCUSS] Executing (Jenkins) RunnableOnService tests more efficiently.

2016-10-20 Thread Davor Bonaci
I'd be hugely in favor, however, this is not what the Apache Jenkins
supports right now, AFAIK. I've asked Infra about this awhile ago, but
nothing has moved yet. There was also a Jira issue about it, INFRA-11610.

On Thu, Oct 20, 2016 at 12:24 PM, Amit Sela  wrote:

> Hi all,
>
> I'd like to discuss options to execute ROS tests (per runner) more
> efficiently, and explore the option of running them on PreCommit, as
> opposed to PostCommit as they run today.
>
> The SDK provides a set of tests called "RunnableOnService" (aka ROS) that
> can be applied to a runner and validate it (correctly) supports SDK
> features.
> It's 300+ tests in total (batch + streaming) and it clearly takes time, and
> that is why it runs on PostCommit.
> I think we should look for a configuration where this is executed
> more efficiently and if possible run on PreCommit since runners are
> encouraged to rely on those tests and it's better to know of breaking
> changes before hand.
>
> This came up somewhere in this
>  conversation, and the
> highlights are basically:
>
> Kenneth Knowles suggested we might parallelize sub-builds in the following
> manner:
>
>1. Run unit tests.
>2. (sub tasks) Run ROS tests for each runner in parallel, skipping unit
>tests.
>
> I was wondering if we could setup Jenkins to run ROS per runner only of
> there was a code change for that runner - of course SDK changes will
> probably have to run ROS for all runners, but that might still be an
> optimization.
>
> I think one of Beam's best sell-points is it's extensive testing framework,
> and the fact that runners can be validated across capabilities, but it
> would be best to know of runner-braking changes before merging to master.
>
> Thoughts ?
>
> Thanks,
> Amit
>


Re: Start of release 0.3.0-incubating

2016-10-20 Thread Davor Bonaci
It's been a while since the last release, and I think we have accumulated
plenty of improvements across the board [1]. There are new IOs to be
released, performance improvements, and a ton of fixes.

As a general principle, I'm always advocating for delaying releases when
there are outstanding bug fixes. For new features, however, I'm usually on
the fence. It happens sometimes that new features are rushed to make a
release, then we discover important issues later on, and sometimes regret
the decision.

Of course, UnboundedSource for the SparkRunner and MqttIo would be
additional great improvements, and we should get that out to our users as
soon as possible too.

In this particular case, I think it is perfectly reasonable either to:
* try to get 0.3.0 out now and follow it quickly with 0.4.0, as soon as
these improvements are ready, or
* delay the release, but with a specific time box of a few days.

I'd give some preference to the first option now, since it is important to
keep a cadence of releases during incubation and build experience with the
process. If we were post-graduation, I'd almost certainly give a preference
to the second approach.

Davor

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12338051

On Thu, Oct 20, 2016 at 9:32 AM, Amit Sela  wrote:

> +1
>
> I would like to have my standing PRs merged please - they should provide
> support for UnboundedSource for the SparkRunner.
> If it won't be ready for merge at the beginning of next week, don't hold
> for me.
>
> Thanks,
> Amit
>
> On Thu, Oct 20, 2016 at 7:27 PM Jean-Baptiste Onofré 
> wrote:
>
> > +1
> >
> > Thanks Aljosha !!
> >
> > Do you mind to wait the week end or Monday to start the release ? I would
> > like to include MqttIO if possible.
> >
> > Thanks !
> > Regards
> > JB
> >
> > ⁣​
> >
> > On Oct 20, 2016, 18:07, at 18:07, Dan Halperin
> 
> > wrote:
> > >On Thu, Oct 20, 2016 at 12:37 AM, Aljoscha Krettek
> > >
> > > wrote:
> > >
> > >> Hi,
> > >> thanks for taking the time and writing this extensive doc!
> > >>
> > >> If no-one is against this I would like to be the release manager for
> > >the
> > >> next (0.3.0-incubating) release. I would work with the guide and
> > >update it
> > >> with anything that I learn along the way. Should I open a new thread
> > >for
> > >> this or is it ok of nobody objects here?
> > >>
> > >> Cheers,
> > >> Aljoscha
> > >>
> > >
> > >Spinning this out as a separate thread.
> > >
> > >+1 -- Sounds great to me!
> > >
> > >Dan
> > >
> > >On Thu, Oct 20, 2016 at 12:37 AM, Aljoscha Krettek
> > >
> > >wrote:
> > >
> > >> Hi,
> > >> thanks for taking the time and writing this extensive doc!
> > >>
> > >> If no-one is against this I would like to be the release manager for
> > >the
> > >> next (0.3.0-incubating) release. I would work with the guide and
> > >update it
> > >> with anything that I learn along the way. Should I open a new thread
> > >for
> > >> this or is it ok of nobody objects here?
> > >>
> > >> Cheers,
> > >> Aljoscha
> > >>
> > >> On Thu, 20 Oct 2016 at 07:10 Jean-Baptiste Onofré 
> > >wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > well done.
> > >> >
> > >> > As already discussed, it looks good to me ;)
> > >> >
> > >> > Regards
> > >> > JB
> > >> >
> > >> > On 10/20/2016 01:24 AM, Davor Bonaci wrote:
> > >> > > Hi everybody,
> > >> > > As a project, I think we should have a Release Guide to document
> > >the
> > >> > > process, have consistent releases, on-board additional release
> > >> managers,
> > >> > > and generally share knowledge. It is also one of the project
> > >graduation
> > >> > > guidelines.
> > >> > >
> > >> > > Dan and I wrote a draft version, documenting the process we did
> > >for the
> > >> > > first two releases. It is currently in a pull request [1]. I'd
> > >invite
> > >> > > everyone interested to take a peek and comment, either on the
> > >pull
> > >> > request
> > >> > > itself or here on mailing list, as appropriate.
> > >> > >
> > >> > > Thanks,
> > >> > > Davor
> > >> > >
> > >> > > [1] https://github.com/apache/incubator-beam-site/pull/49
> > >> > >
> > >> >
> > >> > --
> > >> > Jean-Baptiste Onofré
> > >> > jbono...@apache.org
> > >> > http://blog.nanthrax.net
> > >> > Talend - http://www.talend.com
> > >> >
> > >>
> >
>


Re: AppVeyor for Windows compatibility testing

2016-10-19 Thread Davor Bonaci
I think we should use Apache Jenkins to get this coverage. It supports both
cross-platform and cross-JDK coverage. It should be relatively
straightforward to get this enabled.

On Wed, Oct 19, 2016 at 2:54 PM, Lukasz Cwik 
wrote:

> I noticed that the Maven exec plugin was using AppVeyor to get testing to
> occur on windows. Since this is currently a gap in our coverage today, is
> this something we can enable much like our Travis CI for the Apache Beam
> project?
>


Release Guide

2016-10-19 Thread Davor Bonaci
Hi everybody,
As a project, I think we should have a Release Guide to document the
process, have consistent releases, on-board additional release managers,
and generally share knowledge. It is also one of the project graduation
guidelines.

Dan and I wrote a draft version, documenting the process we did for the
first two releases. It is currently in a pull request [1]. I'd invite
everyone interested to take a peek and comment, either on the pull request
itself or here on mailing list, as appropriate.

Thanks,
Davor

[1] https://github.com/apache/incubator-beam-site/pull/49


Re: Documentation for IDE setup

2016-10-14 Thread Davor Bonaci
Thanks guys for doing this! Friction-free contributor experience would be
really beneficial.

On Fri, Oct 14, 2016 at 8:51 AM, Jean-Baptiste Onofré 
wrote:

> I gonna merge.
>
> Thanks.
>
>
> On 10/14/2016 05:37 PM, Daniel Kulp wrote:
>
>>
>> On Oct 14, 2016, at 10:06 AM, Jesse Anderson 
>>> wrote:
>>>
>>> Last week I imported Beam with IntelliJ and everything worked.
>>>
>>> That said, I tried to import the Eclipse project and that doesn't compile
>>> anymore. I didn't have time to figure out what happened though.
>>>
>>>
>> I have a pull request https://github.com/apache/incubator-beam/pull/1094
>> that fixes the compile issues.  It has two LGTM’s, just needs someone to
>> merge it.
>>
>> With eclipse, you need to have all the needed m2e connectors.   Some of
>> them (find bugs, check style) can be auto-detected and installed when beam
>> is first imported.   The apt one doesn’t.   You need to go to the eclipse
>> marketplace, install it, then configure it in the Eclipse properties to
>> turn on the “experimental” m2e-apt processing.   Once you do that, a
>> refresh of the maven projects should result in it building/compiling.
>>
>> Running tests is another matter.   Since eclipse compiles everything in a
>> module in one pass (instead of two like maven), one of the apt processors
>> doesn’t know where to output files and always dumps the files in /classes
>> instead of /test-classes.   Thus, any test that relies on a runner will
>> likely fail as it results in the “test” versions of various services from
>> core being picked up.  A simple:
>>
>> rm sdks/java/core/target/classes/META-INF/services/*
>>
>> From the command line will fix that.   That should also be documented on
>> the IDE page until someone can figure out how to work around it.
>>
>> Dan
>>
>>
>>
>> On Fri, Oct 14, 2016 at 1:21 AM Jean-Baptiste Onofré 
>>> wrote:
>>>
>>> Hi Christian,

 IntelliJ doesn't need any special config (maybe the code style can be
 documented or imported).

 Anyway, agree to add such on website in the contribute directory. I
 think it could be part of the contribution-guide as it's first setup
 step.

 Regards
 JB

 On 10/14/2016 10:17 AM, Christian Schneider wrote:

> Hello all,
>
> I am new to the beam community and currently start making myself
> familiar with the code.  I quickly found the contribution guide and was
> able to clone the code and build beam using maven.
>
> The first obstacle I faced was getting the code build in eclipse. I
> naively imported as existing maven projects but got lots of compile
> errors. After talking to Dan Kulp we found that this is due to the apt
> annotation processing for auto value types. Dan explained me how I need
> to setup eclipse to make it work.
>
> I still got 5 compile errors (Some bound mismatch at Read.bounded, and
> one ambiguous method empty). These errors seem to be present for
> everyone using eclipse and Dan works on it. So I think this is not a
> permanent problem.
>
> To make it easier for new people I would like to write a documentation
> about the IDE setup. I can cover the eclipse part but I think intellij
> should also be described.
>
> I already started with it and placed it in /contribute/ide-setup. Does
> that make sense?
>
> I currently did not link to it from anywhere. I think it should be
> linked in the contribute/index and in the Contribute menu.
>
> Christian
>
>
 --
 Jean-Baptiste Onofré
 jbono...@apache.org
 http://blog.nanthrax.net
 Talend - http://www.talend.com


>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: [REMINDER] Technical discussion on the mailing list

2016-10-05 Thread Davor Bonaci
Daniel, so glad you are starting to contribute to Beam! It was great
talking with you in person back in May. Welcome!

--

There are lots of different things mentioned here; I'll try to address them
separately.

The first use of AutoValue should have been discussed on the dev@ mailing
list. I think the main reason for the discussion is a bit different --
AutoValue has a non-trivial tradeoff -- compile complexity vs. boilerplate
code. For example, AutoValue may degrade IDE experience for some
contributors. If we'd go in depth on this, I'm sure we'd find opposing
opinions on the use of AutoValue. This tradeoff should have been discussed
on the dev@ list, followed by a community decision.

Note that this has happened *before* the JdbcIO work. Since AutoValue has
been already used elsewhere in the project, there was no real reason not to
use it in JdbcIO, as appropriate. Therefore, I think JB and Eugene did
everything right! Second, third or thousandth usage of a concept doesn't
require any particular discussion. They didn't make anything worse. Their
discussion is totally appropriate for code review.

Now, as Daniel points out, I think it is not right to ask a contributor to
change his PR to use AutoValue when none of the existing IO connectors use
it. This is making a too high standard. In fact, it is desirable for new
contributions to follow already established patterns, instead of inventing
something new. If we want to change a pattern, we should do it as a
separate effort across the board.

On the other hand, dev@ discussion wouldn't have helped to prevent review
comments/discussions. Let's say we have had the discussion, and a new
contributor comes a year later. Should we ask her to read all discussions
that ever happened in the project to learn everything she might need? Of
course not! She should follow already established patterns and learn any
specifics during code review. And then, best practices should be documented
on the website.

To summarize, a few things could have been better:
* Discussion of the first use of AutoValue on dev@.
* Avoiding overzealous core review comments.
* Changing a pattern should have been done by filing several starter tasks
in JIRA.

--

There are also several different proposals for altering a part of the
workflow.

> code review comments not making to a list

We have >1000 PRs so far, with at least a dozen comments on average, with
pace increasing. This is >10,000 emails, most of which are "fix a typo".
This leads into information overload, with actual information being missed.

If someone wants this extra information -- just clicking the Watch button
in the GitHub UI will make it happen!

> creating new JIRA and opening PR to dev@

These currently go to commits@. This would have resulted in another 1,700
email threads compared to <150 now.

Generally speaking, *all* of this is already available to anyone who wants
to receive it. However, anyone I know that has tried, has given up very
quickly ;). If anybody is concerned, we can create several new lists for
this traffic -- but we shouldn't repurpose dev@ for it.

> I feel I’m missing things as there is significant amount of things not
happening on a list

I think "feeling of missing things" is totally valid. I feel that too, as
well as almost everybody else.

My best answer is -- we should realize that we are an extremely large and
complex project, with >100 contributors and >20 people working on it full
time. Nobody can follow every SDK, every runner, every IO connector, every
pull request, every comment that all contributors make each and every day.

While nobody can follow everything, everything is being followed by
multiple people. And, we need to be accountable to each other to surface
everything relevant to the dev@ list. And I believe that is already
happening the vast majority of time. This is just one example where it
didn't happen.

--

All that said, there are certainly areas for improvement. If anyone has
specific ideas, please reach out! I'd love to discuss them in detail and
propose improvements to the wider community.

Thanks!


On Wed, Oct 5, 2016 at 6:16 PM, Thomas Weise  wrote:
>
> How about sending just the notifications for creating new JIRA and opening
> PR to dev@ so that those that are interested can start watching?
>
> Thanks,
> Thomas
>
> On Wed, Oct 5, 2016 at 5:33 PM, Dan Halperin 
> wrote:
>
> > On Wed, Oct 5, 2016 at 5:13 PM, Daniel Kulp  wrote:
> >
> > > I just want to give a little more context to this….  I’ve been
lurking on
> > > this list for several months now reading everything that’s going on.
> >  From
> > > Apache’s standpoint, that should be a “very good start” for getting to
> > know
> > > what is happening in a project.
> > >
> > > On my last PR, Eugene commented about using the AutoValue pattern for
> > part
> > > of it which caught me off guard.   None of the other IO’s in master
were
> > > using it, there wasn’t any discussion on this list about it, I had no
> > idea
> > > wh

Re: Improvements to issue/version tracking

2016-06-30 Thread Davor Bonaci
Sounds like we have an unanimous support. Thanks everyone!

On Thu, Jun 30, 2016 at 5:42 AM, Maximilian Michels  wrote:

> +1
>
> >For us normally resolved issues will always have a development version as
> >"Fix Versions" field, so the issue will only be closed when the version
> >that includes that issue (bug, feature or whatever) actually gets
> released.
>
> I think it should be optional as Davor suggested because you don't
> always want to fix all open issues in the next release.
>
> On Wed, Jun 29, 2016 at 10:58 PM, Amit Sela  wrote:
> > +1
> >
> > On Wed, Jun 29, 2016 at 12:04 AM Lukasz Cwik 
> > wrote:
> >
> >> +1
> >>
> >> On Tue, Jun 28, 2016 at 12:15 PM, Kenneth Knowles
> 
> >> wrote:
> >>
> >> > +1
> >> >
> >> > On Tue, Jun 28, 2016 at 12:06 AM, Jean-Baptiste Onofré <
> j...@nanthrax.net>
> >> > wrote:
> >> >
> >> > > +1
> >> > >
> >> > > Regards
> >> > > JB
> >> > >
> >> > >
> >> > > On 06/28/2016 01:01 AM, Davor Bonaci wrote:
> >> > >
> >> > >> Hi everyone,
> >> > >> I'd like to propose a simple change in Beam JIRA that will
> hopefully
> >> > >> improve our issue and version tracking -- to actually use the "Fix
> >> > >> Versions" field as intended [1].
> >> > >>
> >> > >> The goal would be to simplify issue tracking, streamline
> generation of
> >> > >> release notes, add a view of outstanding work towards a release,
> and
> >> > >> clearly communicate which Beam version contains fixes for each
> issue.
> >> > >>
> >> > >> The standard usage of the field is:
> >> > >> * For open (or in-progress/re-opened) issues, "Fix Versions" field
> is
> >> > >> optional and indicates an unreleased version that this issue is
> >> > targeting.
> >> > >> The release is not expected to proceed unless this issue is fixed,
> or
> >> > the
> >> > >> field is changed.
> >> > >> * For closed (or resolved) issues, "Fix Versions" field indicates a
> >> > >> released or unreleased version that has the fix.
> >> > >>
> >> > >> I think the field should be mandatory once the issue is
> >> resolved/closed
> >> > >> [4], so we make a deliberate choice about this. I propose we use
> "Not
> >> > >> applicable" for all those issues that aren't being resolved as
> Fixed
> >> > >> (e.g.,
> >> > >> duplicates, working as intended, invalid, etc.) and those that
> aren't
> >> > >> released (e.g., website, build system, etc.).
> >> > >>
> >> > >> We can then trivially view outstanding work for the next release
> [2],
> >> or
> >> > >> generate release notes [3].
> >> > >>
> >> > >> I'd love to hear if there are any comments! I know that at least JB
> >> > >> agrees,
> >> > >> as he was convincing me on this -- thanks ;).
> >> > >>
> >> > >> Thanks,
> >> > >> Davor
> >> > >>
> >> > >> [1]
> >> > >>
> >> > >>
> >> >
> >>
> https://confluence.atlassian.com/adminjiraserver071/managing-versions-802592484.html
> >> > >> [2]
> >> > >>
> >> > >>
> >> >
> >>
> https://issues.apache.org/jira/browse/BEAM/fixforversion/12335766/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-summary-panel
> >> > >> [3]
> >> > >>
> >> > >>
> >> >
> >>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12335764
> >> > >> [4] https://issues.apache.org/jira/browse/INFRA-12120
> >> > >>
> >> > >>
> >> > > --
> >> > > Jean-Baptiste Onofré
> >> > > jbono...@apache.org
> >> > > http://blog.nanthrax.net
> >> > > Talend - http://www.talend.com
> >> > >
> >> >
> >>
>


Beam at Hadoop Summit 2016 in San Jose, CA

2016-06-27 Thread Davor Bonaci
Hi everyone,
I'm happy to share that Apache Beam will be featured at the Hadoop Summit
in San Jose, CA later this week [1].

There'll be at least 2 sessions:

* The Next Generation of Data Processing & OSS
Speakers: James Malone, Eric Schmidt
Wednesday, June 29, 2016 @ 2:10 PM

* Apache Beam: A Unified Model for Batch and Streaming Data Processing
Speaker: Davor Bonaci
Thursday, June 30, 2016 @ 4:10 PM

If you'll be attending the conference and would like to talk about
all-things-Beam, please reach out!

Thanks,
Davor

[1] http://hadoopsummit.org/san-jose/


Davor to be out for a little bit

2016-06-27 Thread Davor Bonaci
Hi everyone,
Just a quick note -- I'll be out of my regular duties at work for a little
bit starting after this Friday, 7/1, and may be less active on mailing
lists, JIRA, and code reviews during this time.

I'll be back at full capacity soon, but the exact return date is
to-be-determined.

Thanks,
Davor


Improvements to issue/version tracking

2016-06-27 Thread Davor Bonaci
Hi everyone,
I'd like to propose a simple change in Beam JIRA that will hopefully
improve our issue and version tracking -- to actually use the "Fix
Versions" field as intended [1].

The goal would be to simplify issue tracking, streamline generation of
release notes, add a view of outstanding work towards a release, and
clearly communicate which Beam version contains fixes for each issue.

The standard usage of the field is:
* For open (or in-progress/re-opened) issues, "Fix Versions" field is
optional and indicates an unreleased version that this issue is targeting.
The release is not expected to proceed unless this issue is fixed, or the
field is changed.
* For closed (or resolved) issues, "Fix Versions" field indicates a
released or unreleased version that has the fix.

I think the field should be mandatory once the issue is resolved/closed
[4], so we make a deliberate choice about this. I propose we use "Not
applicable" for all those issues that aren't being resolved as Fixed (e.g.,
duplicates, working as intended, invalid, etc.) and those that aren't
released (e.g., website, build system, etc.).

We can then trivially view outstanding work for the next release [2], or
generate release notes [3].

I'd love to hear if there are any comments! I know that at least JB agrees,
as he was convincing me on this -- thanks ;).

Thanks,
Davor

[1]
https://confluence.atlassian.com/adminjiraserver071/managing-versions-802592484.html
[2]
https://issues.apache.org/jira/browse/BEAM/fixforversion/12335766/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-summary-panel
[3]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12335764
[4] https://issues.apache.org/jira/browse/INFRA-12120


Re: newbie question about beam

2016-06-16 Thread Davor Bonaci
We are in process of porting Cloud Dataflow documentation to Beam, so I'll
give you a mix of Dataflow and Beam links.

FilesToStage is a pipeline option [1], [2]. Super-easy to use.
Side inputs are a ParDo concept [3].

If you hit any rough edges, please let us know -- I'd be glad to help!

[1]
https://cloud.google.com/dataflow/pipelines/specifying-exec-params#setting-other-cloud-pipeline-options
[2]
https://beam.incubator.apache.org/javadoc/0.1.0-incubating/org/apache/beam/runners/dataflow/options/DataflowPipelineWorkerPoolOptions.html#getFilesToStage--
[3] https://cloud.google.com/dataflow/model/par-do#side-inputs

On Thu, Jun 16, 2016 at 1:40 AM, Sergio Fernández  wrote:

> Hi Davor,
>
> On Thu, Jun 16, 2016 at 3:04 AM, Davor Bonaci 
> wrote:
>
> > This is a really good question, Sergio. You got right away to the crux of
> > the problem -- how to express such pattern in the Beam model.
> >
> > The answer depends whether the data is static, e.g., whether it is known
> at
> > pipeline construction time / computed in the earlier stages of the
> > pipeline, or perhaps evolving during pipeline execution. I'll give a
> > high-level answer -- feel free to share more information about your use
> > case and we can drill into specific details.
> >
>
> Well, as a said, for us is more interesting to use Beam in processing time
> that for training purposes. In the past we have experimented a bit with
> approaches like TensorSpark <https://github.com/adatao/tensorspark>, but
> the critical aspect is exploitation of the models. Therefore we could
> assume the models are static data.
>
>
>
> > In the simplest case, Beam supports "files to stage" concept if the data
> is
> > known apriori. In this case, runners will distribute the data to all
> > workers before computation starts, and your logic can depend on the data
> > being available locally on each worker.
> >
>
> Oh, cool. Something like that would be more than enough for now. Can you
> please point me to any documentation or code I could use to play with it?
>
>
> If this is not sufficient, Beam's side inputs are the right primitive. We
> > support several access patterns for side inputs, including distributed
> > lookup and various types of caching. This can work really well,
> > particularly with a well-optimized runner.
> >
>
> Interesting... any (early) documentation (or code) about such feature?
>
>
>
> > Other alternatives typically include access to a shared storage, which
> is a
> > lower-level approach and often requires more work.
>
>
> Sure, share-storage is always an option, but for many reasons I'd rather
> not resort to such approach.
>
> Thanks so much for all the ideas and valuable discussions!
>
> Cheers,
>
> --
> Sergio Fernández
> Partner Technology Manager
> Redlink GmbH
> m: +43 6602747925
> e: sergio.fernan...@redlink.co
> w: http://redlink.co
>


Re: newbie question about beam

2016-06-15 Thread Davor Bonaci
This is a really good question, Sergio. You got right away to the crux of
the problem -- how to express such pattern in the Beam model.

The answer depends whether the data is static, e.g., whether it is known at
pipeline construction time / computed in the earlier stages of the
pipeline, or perhaps evolving during pipeline execution. I'll give a
high-level answer -- feel free to share more information about your use
case and we can drill into specific details.

In the simplest case, Beam supports "files to stage" concept if the data is
known apriori. In this case, runners will distribute the data to all
workers before computation starts, and your logic can depend on the data
being available locally on each worker.

If this is not sufficient, Beam's side inputs are the right primitive. We
support several access patterns for side inputs, including distributed
lookup and various types of caching. This can work really well,
particularly with a well-optimized runner.

Other alternatives typically include access to a shared storage, which is a
lower-level approach and often requires more work.

--

Back to Ismael's question -- Beam is great at orchestrating such pipelines.
You can build the pipeline that prepares data for a custom system, manages
its invocation, and processes its output. PTransforms can encapsulate
arbitrary computation, including invocation of an outside logic / system.
It would be great to have a set of PTransform libraries that wrap such
computations.

On Wed, Jun 15, 2016 at 2:45 AM, Jean-Baptiste Onofré 
wrote:

> I would say DSL + PTransform should work.
>
> But certainly some PoC to do ;)
>
> Regards
> JB
>
>
> On 06/15/2016 11:39 AM, Ismaël Mejía wrote:
>
>> One interesting point that Sergio mentions and that it is getting lost in
>> the discussion is how to integrate other dataflow style frameworks into
>> Beam, e.g. Tensorflow. I am really curious about what the others have to
>> say about this since this is probably one question that will come once
>> more
>> users write Pipelines on Beam. Any ideas on this ? or the solution is just
>> to write some 'integration PTransforms' and that's it ?
>>
>> Regards,
>> Ismaël
>>
>> ps. I forgot to say Hi and welcome Sergio :).
>>
>>
>> On Wed, Jun 15, 2016 at 11:18 AM, Jean-Baptiste Onofré 
>> wrote:
>>
>> Not the Beam Model for sure (the Beam Model is about the pipeline design).
>>>
>>> The Beam Runner API can help there, but the final implement is on the
>>> runner itself.
>>>
>>> Regards
>>> JB
>>>
>>>
>>> On 06/15/2016 10:18 AM, Sergio Fernández wrote:
>>>
>>> Hi Jean-Baptiste,

 On Tue, Jun 14, 2016 at 12:45 PM, Jean-Baptiste Onofré >>> >
 wrote:


> Welcome aboard, and good to discuss with you during ApacheCon.
>
>
> Was nice to put you all faces ;-)


 Distribution of the resources is a point related to runner, and more

> specifically to the execution environment of the runner. Each
> runner/backend will implement their own logic.
>
>
> Yes, I can understand. But I wonder if the Beam Model provides any
 primitive to deal with such aspects in an abstract way. I guess I'd need
 to
 go deeper into Beam to approach you with more concrete questions; so for
 now it's fine.

 Regarding the Python SDK, we discussed about that last week: it's on the

 way. We should have the Python SDK very soon (we were busy with the
> first
> release).
>
>

 Yep, I knew that was the plan. It's really cool to have it already is
 master to the next release :-)

 Thanks.





 On 06/14/2016 12:38 PM, Sergio Fernández wrote:
>
> Hi guys,
>
>>
>> I'm newbie in the Beam community, but as someone who has used DataFlow
>> in
>> the past I've been following the podling since you came to ASK. I'm
>> very
>> happy to see that 0.1.0-incubating is finally going out,
>> congratulations
>> for such great milestone.
>>
>> I discussed with some of you guys in the last ApacheCon, and for me
>> was
>> good to know the Python SDK was just a matter of time and should come
>> to
>> Beam at some point. So coming back to the original plans <
>>
>>
>>
>> http://beam.incubator.apache.org/beam/python/sdk/2016/02/25/python-sdk-now-public.html
>>
>> ,
>>>
>>> do you manage any timeline to bring the Python SDK to Beam?
>>
>> So I'd like to bring a question how Beam plans to deal with the
>> distribution of resources across all nodes, something I know it not
>> really
>> clean with some runners (e.g., Spark). More concretely, we're using
>> Keras
>> <
>> http://keras.io/>, a deep learning Python library that is capable of
>> running on top of either TensorFlow or Theano. Historically I know
>> DataFlow
>> and TensorFlow are not very compatible. But I wonder if the project
>> has
>

Re: Talking About Beam

2016-06-15 Thread Davor Bonaci
Great work Jesse!

On Wed, Jun 15, 2016 at 10:59 AM, amir bahmanyari <
amirto...@yahoo.com.invalid> wrote:

> Totally agree with Amit.I am doing bench-marking with processing the
> in-depth Linear Road stream data rather than bench-marking WordCount
> .WordCount provides you great starting matrices for what its supposed to
> benefit us.You need to target your own"processing space" if I am using the
> right wording to move on with the right technology for you.Cheers
>
>   From: Jean-Baptiste Onofré 
>  To: dev@beam.incubator.apache.org
>  Sent: Wednesday, June 15, 2016 12:49 AM
>  Subject: Re: Talking About Beam
>
> Full agree with Amit.
>
> Good job Jesse !
>
> Regards
> JB
>
> On 06/15/2016 09:37 AM, Amit Sela wrote:
> > Great writing Jesse!
> >
> >  From my experience in the last year, working on a stream processing (and
> > generally data processing) platform at PayPal, Beam could also offer a
> > great approach for large projects - up until now (and in my case as
> well),
> > the process was:
> >
> >1. Research and paper analysis of existing frameworks.
> >2. Understand your needs.
> >3. Choose (and commit to) a specific technology - example: Spark.
> >4. Get to work..
> >
> > I believe Beam could change this into something better, such as:
> >
> >1. Understand your needs, and start working on them.
> >2. Combine your research with actually running (your) same code on
> >different frameworks - probably better then "WordCount" benchmarks.
> >3. Choose the best framework for you, or choose more than one if the
> >benefit is worth the overhead.
> >4. While working on 2 & 3, you keep going forward with your project!
> >
> > I talked about Beam in Barclays-Techstars Accelerator in Israel last
> month
> > because I totally agree that it's a great starting point for startups,
> but
> > I think this is an example why not just startups :)
> >
> > Thanks,
> > Amit
> >
> > On Wed, Jun 15, 2016 at 9:58 AM Jesse Anderson 
> > wrote:
> >
> >> I wrote a piece published on O'Reilly about Beam
> >>
> >>
> https://www.oreilly.com/ideas/future-proof-and-scale-proof-your-code?utm_medium=social&utm_source=twitter.com&utm_campaign=lgen&utm_content=data+article+ki&cmp=tw-data-na-article-lgen_tw_article
> >> .
> >> It gives some of the thoughts and ideas that will help Beam adoption. I
> >> suggest reading it to get some ideas for how to talk about Beam at talks
> >> and conferences.
> >>
> >> Before writing the piece, I tested how it resonates with people. These
> >> really help people understand why Beam is used and how it solves the
> future
> >> proofing and scale proofing problems small companies face.
> >>
> >> Thanks,
> >>
> >> Jesse
> >>
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
>
>
>


[dev] Announcing 0.1.0-incubating release

2016-06-15 Thread Davor Bonaci
Hi everyone,
I’m happy to announce that we have completed our first release – version
0.1.0-incubating is now available [1].

I'm thrilled about this -- it is an exciting milestone for the project!

I'd like to thank *all* contributors [2] -- this milestone is a result of a
truly great work by the entire community. Special thanks goes to Frances
Perry, Dan Halperin and, of course, our mentor Jean-Baptiste Onofré, who
was instrumental with his guidance on the Apache way.

Davor

[1]
https://beam.incubator.apache.org/beam/release/2016/06/15/first-release.html
[2]
https://github.com/apache/incubator-beam/graphs/contributors?from=2016-02-26&to=2016-06-15&type=c


Re: newbie question about beam

2016-06-14 Thread Davor Bonaci
Hi Sergio,
It was great talking with you in Vancouver.

As of today, the Python SDK is here, [1], [2]. Wasn't that fast enough ;)

Davor

[1] https://github.com/apache/incubator-beam/pull/461
[2] https://github.com/apache/incubator-beam/tree/python-sdk/sdks/python

On Tue, Jun 14, 2016 at 3:45 AM, Jean-Baptiste Onofré 
wrote:

> Hi Sergio,
>
> Welcome aboard, and good to discuss with you during ApacheCon.
>
> Distribution of the resources is a point related to runner, and more
> specifically to the execution environment of the runner. Each
> runner/backend will implement their own logic.
>
> I don't know Keras enough to provide a strong advice.
>
> Regarding the Python SDK, we discussed about that last week: it's on the
> way. We should have the Python SDK very soon (we were busy with the first
> release).
>
> Regards
> JB
>
>
> On 06/14/2016 12:38 PM, Sergio Fernández wrote:
>
>> Hi guys,
>>
>> I'm newbie in the Beam community, but as someone who has used DataFlow in
>> the past I've been following the podling since you came to ASK. I'm very
>> happy to see that 0.1.0-incubating is finally going out, congratulations
>> for such great milestone.
>>
>> I discussed with some of you guys in the last ApacheCon, and for me was
>> good to know the Python SDK was just a matter of time and should come to
>> Beam at some point. So coming back to the original plans <
>>
>> http://beam.incubator.apache.org/beam/python/sdk/2016/02/25/python-sdk-now-public.html
>> >,
>> do you manage any timeline to bring the Python SDK to Beam?
>>
>> So I'd like to bring a question how Beam plans to deal with the
>> distribution of resources across all nodes, something I know it not really
>> clean with some runners (e.g., Spark). More concretely, we're using Keras
>> <
>> http://keras.io/>, a deep learning Python library that is capable of
>> running on top of either TensorFlow or Theano. Historically I know
>> DataFlow
>> and TensorFlow are not very compatible. But I wonder if the project has
>> already discussed how to support running Keras (TensorFlow) tasks on Beam.
>> For us is more for querying than for training, so I'd like to know if the
>> Beam Model could natively support the distribution of the models
>> (sometimes
>> several GB).
>>
>> Thanks in advance.
>>
>> Cheers,
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: [RESULT] [VOTE] Release version 0.1.0-incubating

2016-06-14 Thread Davor Bonaci
The Apache Incubator has unanimously approved this release, with 6
approving and binding votes.

We are now proceeding with the final steps of the release.

On Sun, Jun 12, 2016 at 2:33 PM, Ismaël Mejía  wrote:

> Congratulations Davor, you, JB and all the team have made a great job. I am
> really happy to see this release going out !
>
> And remember they used to say that the first apache release is the hardest
> one, so from now on it should be easier :)
>
>
> On Sun, Jun 12, 2016 at 8:23 AM, Jesse Anderson 
> wrote:
>
> > Congrats on the first release!
> >
> > On Sun, Jun 12, 2016, 7:50 AM Davor Bonaci 
> > wrote:
> >
> > > I'm happy to announce that we have unanimously approved this release.
> > >
> > > There are 10 approving votes, 9 of which are binding:
> > > * Davor Bonaci
> > > * Robert Bradshaw
> > > * Ben Chambers
> > > * Dan Halperin
> > > * Kenneth Knowles
> > > * Aljoscha Krettek
> > > * James Malone
> > > * Jean-Baptiste Onofré
> > > * Amit Sela
> > > * Scott Wegner
> > >
> > > There are no disapproving votes.
> > >
> > > At this point, this proposal will be presented to the Apache Incubator
> > for
> > > their review.
> > >
> > > Thanks everyone! Personally, I'm super excited to see our first release
> > > getting so close!
> > >
> > > Davor
> > >
> > > -- Forwarded message --
> > > From: Davor Bonaci 
> > > Date: Wed, Jun 8, 2016 at 4:20 PM
> > > Subject: [VOTE] Release version 0.1.0-incubating
> > > To: dev@beam.incubator.apache.org
> > >
> > >
> > > Hi everyone,
> > > Here's the first vote for the first release of Apache Beam -- version
> > > 0.1.0-incubating!
> > >
> > > As a reminder, we aren't looking for any specific new functionality,
> but
> > > would like to release the existing code, get something to our users'
> > hands,
> > > and test the processes. Previous discussions and iterations on this
> > release
> > > have been archived on the dev@ mailing list.
> > >
> > > The complete staging area is available for your review, which includes:
> > > * the official Apache source release to be deployed to dist.apache.org
> > > [1],
> > > and
> > > * all artifacts to be deployed to the Maven Central Repository [2].
> > >
> > > This corresponds to the tag "v0.1.0-incubating-RC3" in source control,
> > [3].
> > >
> > > Please vote as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > > For those of us enjoying our first voting experience -- the release
> > > checklist is here [4]. This is a "package release"-type of the Apache
> > > voting process [5]. As customary, the vote will be open for 72 hours.
> It
> > is
> > > adopted by majority approval with at least 3 PPMC affirmative votes. If
> > > approved, the proposal will be presented to the Apache Incubator for
> > their
> > > review.
> > >
> > > Thanks,
> > > Davor
> > >
> > > [1]
> > >
> > >
> >
> https://repository.apache.org/content/repositories/orgapachebeam-1002/org/apache/beam/beam-parent/0.1.0-incubating/beam-parent-0.1.0-incubating-source-release.zip
> > > [2]
> > https://repository.apache.org/content/repositories/orgapachebeam-1002/
> > > [3]
> https://github.com/apache/incubator-beam/tree/v0.1.0-incubating-RC3
> > > [4]
> http://incubator.apache.org/guides/releasemanagement.html#check-list
> > > [5] http://www.apache.org/foundation/voting.html
> > >
> >
>


Re: Apache Beam for Python

2016-06-14 Thread Davor Bonaci
Awesome job, Silviu! Really excited to have Python SDK join us in Beam.

I'll take care of merging the pull request. Let's start with a feature
branch, as per previous conversations on the dev@ list.

On Tue, Jun 14, 2016 at 12:22 PM, Silviu Calinoiu <
silv...@google.com.invalid> wrote:

> Thanks everybody for the welcoming and feedback. The initial code move was
> proposed as pull request #461 [1].
>
> Looking forward to working with everybody in the Beam community and
> especially any Pythonistas out there.
>
> Thanks,
> Silviu
>
> [1] https://github.com/apache/incubator-beam/pull/461
>
> On Sat, Jun 4, 2016 at 12:35 AM, Ismaël Mejía  wrote:
>
> > Excellent guys, Welcome to Beam !
> >
> > I am looking for ways to integrate Beam with the standard notebook tools
> > (Zẽppelin / Jupyter [ipython], so I am really happy to see the python SDK
> > arriving to Beam, Awesome.
> >
> > Ismaël Mejía
> >
> > On Fri, Jun 3, 2016 at 7:17 PM, Amit Sela  wrote:
> >
> > > Welcome Python people ;)
> > >
> > > I know a few people who've been waiting for this one!
> > >
> > > On Fri, Jun 3, 2016, 19:53 Davor Bonaci 
> > wrote:
> > >
> > > > Welcome Python SDK, as well as Silviu, Charles, Ahmet and Chamikara!
> > > >
> > > > On Fri, Jun 3, 2016 at 7:07 AM, Jean-Baptiste Onofré <
> j...@nanthrax.net>
> > > > wrote:
> > > >
> > > > > Absolutely ;)
> > > > >
> > > > >
> > > > > On 06/03/2016 03:51 PM, James Malone wrote:
> > > > >
> > > > >> Hey Silviu!
> > > > >>
> > > > >> I think JB is proposing we create a python directory in the sdks
> > > > directory
> > > > >> in the root repository (and modify the configuration files
> > > accordingly):
> > > > >>
> > > > >> https://github.com/apache/incubator-beam/tree/master/sdks
> > > > >>
> > > > >> This Beam document here titled "Apache Beam (Incubating):
> Repository
> > > > >> Structure" details the proposed repository structure and may be
> > > useful:
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://drive.google.com/a/google.com/folderview?id=0B-IhJZh9Ab52OFBVZHpsNjc4eXc
> > > > >>
> > > > >> Best,
> > > > >>
> > > > >> James
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Fri, Jun 3, 2016 at 6:34 AM, Silviu Calinoiu
> > > > >> 
> > > > >> wrote:
> > > > >>
> > > > >> Hi JB,
> > > > >>> Thanks for the welcome! I come from the Python land so  I am not
> > > quite
> > > > >>> familiar with Maven. What do you mean by a Maven module? You mean
> > an
> > > > >>> artifact so you can install things? In Python, people are used to
> > > > >>> packages
> > > > >>> downloaded from PyPI (pypi.python.org -- which is sort of Maven
> > for
> > > > >>> Python). Whatever is the standard way of doing things in Apache
> > we'll
> > > > do
> > > > >>> it. Just asking for clarifications.
> > > > >>>
> > > > >>> By the way this discussion is very useful since we will have to
> > iron
> > > > out
> > > > >>> several details like this.
> > > > >>> Thanks,
> > > > >>> Silviu
> > > > >>>
> > > > >>> On Fri, Jun 3, 2016 at 6:19 AM, Jean-Baptiste Onofré <
> > > j...@nanthrax.net>
> > > > >>> wrote:
> > > > >>>
> > > > >>> Hi Silviu,
> > > > >>>>
> > > > >>>> thanks for detailed update and great work !
> > > > >>>>
> > > > >>>> I would advice to create a:
> > > > >>>>
> > > > >>>> sdks/python
> > > > >>>>
> > > > >>>> Maven module to store the Python SDK.
> > > > >>>>
> > > > >>>> WDYT ?
> > > > >>>>
> > > > >>>> By the way, welc

Re: Build failed in Jenkins: beam_Release_NightlySnapshot #70

2016-06-13 Thread Davor Bonaci
#70 was a temporary networking issue; a different issue than #69, which
failed creating a local directory.

On Mon, Jun 13, 2016 at 10:00 AM, Jean-Baptiste Onofré 
wrote:

> Thanks for the update Davor.
>
> I don't understand why the local dir can't be created. Filesystem issue ?
>
> Regards
> JB
>
>
> On 06/13/2016 06:39 PM, Davor Bonaci wrote:
>
>> This was a temporary infrastructure issue. No action needed.
>>
>> Failed to deploy artifacts: Could not transfer artifact
>>
>> org.apache.beam:beam-runners-flink_2.10:jar:0.2.0-incubating-20160612.074627-5
>> from/to apache.snapshots.https (
>> https://repository.apache.org/content/repositories/snapshots): Failed to
>> transfer file:
>>
>> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-runners-flink_2.10/0.2.0-incubating-SNAPSHOT/beam-runners-flink_2.10-0.2.0-incubating-20160612.074627-5.jar
>> .
>> Return code is: 502, ReasonPhrase: Proxy Error. -> [Help 1]
>>
>> On Sun, Jun 12, 2016 at 12:48 AM, Apache Jenkins Server <
>> jenk...@builds.apache.org> wrote:
>>
>> See <https://builds.apache.org/job/beam_Release_NightlySnapshot/70/>
>>>
>>> --
>>> [...truncated 4248 lines...]
>>> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.509 sec
>>> - in org.apache.beam.sdk.transforms.windowing.WindowTest
>>> Running org.apache.beam.sdk.transforms.windowing.WindowingTest
>>> Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.559 sec
>>> - in org.apache.beam.sdk.transforms.windowing.WindowingTest
>>> Running org.apache.beam.sdk.transforms.ValuesTest
>>> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.264 sec
>>> - in org.apache.beam.sdk.transforms.ValuesTest
>>> Running org.apache.beam.sdk.transforms.PartitionTest
>>> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.658 sec
>>> - in org.apache.beam.sdk.transforms.PartitionTest
>>> Running org.apache.beam.sdk.transforms.ParDoTest
>>> Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.647
>>> sec - in org.apache.beam.sdk.transforms.ParDoTest
>>> Running org.apache.beam.sdk.transforms.CreateTest
>>> Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.782 sec
>>> - in org.apache.beam.sdk.transforms.CreateTest
>>> Running org.apache.beam.sdk.transforms.CombineTest
>>> Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.883
>>> sec - in org.apache.beam.sdk.transforms.CombineTest
>>> Running org.apache.beam.sdk.transforms.KvSwapTest
>>> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.219 sec
>>> - in org.apache.beam.sdk.transforms.KvSwapTest
>>> Running org.apache.beam.sdk.util.ReshuffleTest
>>> Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.581 sec
>>> - in org.apache.beam.sdk.util.ReshuffleTest
>>>
>>> Results :
>>>
>>> Tests run: 168, Failures: 0, Errors: 0, Skipped: 1
>>>
>>> [JENKINS] Recording test results
>>> [INFO]
>>> [INFO] --- maven-surefire-plugin:2.18.1:test
>>> (streaming-runnable-on-service-tests) @ beam-runners-flink_2.10 ---
>>> [INFO] Tests are skipped.
>>> [INFO]
>>> [INFO] --- apache-rat-plugin:0.11:check (default) @
>>> beam-runners-flink_2.10 ---
>>> [INFO] 51 implicit excludes (use -debug for more details).
>>> [INFO] Exclude: **/target/**/*
>>> [INFO] Exclude: **/dependency-reduced-pom.xml
>>> [INFO] Exclude: .github/**/*
>>> [INFO] Exclude: **/*.iml
>>> [INFO] Exclude: **/package-list
>>> [INFO] Exclude: **/user.avsc
>>> [INFO] Exclude: **/test/resources/**/*.txt
>>> [INFO] Exclude: **/test/**/.placeholder
>>> [INFO] Exclude: .repository/**/*
>>> [INFO] 74 resources included (use -debug for more details)
>>> [INFO] Rat check: Summary of files. Unapproved: 0 unknown: 0 generated: 0
>>> approved: 74 licence.
>>> [INFO]
>>> [INFO] --- maven-failsafe-plugin:2.19.1:verify (default) @
>>> beam-runners-flink_2.10 ---
>>> [INFO] Failsafe report directory: <
>>>
>>> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/runners/flink/runner/target/failsafe-reports
>>>
>>>>
>>>> [JENKINS] Recording test results
>>> [INFO]
>>> [INFO] --- maven-install-plugin:2.5.2:install (default-install) @
>>> beam-runners-flink_2.10 ---
>&

Re: Jenkins build became unstable: beam_Release_NightlySnapshot #69

2016-06-13 Thread Davor Bonaci
Error Message
java.io.IOException: Failed to create local dir in
/tmp/blockmgr-b4594969-d5da-48bb-8572-4521874515b3/0e.

Stacktrace
java.lang.RuntimeException: java.io.IOException: Failed to create local dir
in /tmp/blockmgr-b4594969-d5da-48bb-8572-4521874515b3/0e.

Reason unknown.

On Sat, Jun 11, 2016 at 12:36 AM, Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> See 
>
>


Re: Build failed in Jenkins: beam_Release_NightlySnapshot #70

2016-06-13 Thread Davor Bonaci
This was a temporary infrastructure issue. No action needed.

Failed to deploy artifacts: Could not transfer artifact
org.apache.beam:beam-runners-flink_2.10:jar:0.2.0-incubating-20160612.074627-5
from/to apache.snapshots.https (
https://repository.apache.org/content/repositories/snapshots): Failed to
transfer file:
https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-runners-flink_2.10/0.2.0-incubating-SNAPSHOT/beam-runners-flink_2.10-0.2.0-incubating-20160612.074627-5.jar.
Return code is: 502, ReasonPhrase: Proxy Error. -> [Help 1]

On Sun, Jun 12, 2016 at 12:48 AM, Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> See 
>
> --
> [...truncated 4248 lines...]
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.509 sec
> - in org.apache.beam.sdk.transforms.windowing.WindowTest
> Running org.apache.beam.sdk.transforms.windowing.WindowingTest
> Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.559 sec
> - in org.apache.beam.sdk.transforms.windowing.WindowingTest
> Running org.apache.beam.sdk.transforms.ValuesTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.264 sec
> - in org.apache.beam.sdk.transforms.ValuesTest
> Running org.apache.beam.sdk.transforms.PartitionTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.658 sec
> - in org.apache.beam.sdk.transforms.PartitionTest
> Running org.apache.beam.sdk.transforms.ParDoTest
> Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.647
> sec - in org.apache.beam.sdk.transforms.ParDoTest
> Running org.apache.beam.sdk.transforms.CreateTest
> Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.782 sec
> - in org.apache.beam.sdk.transforms.CreateTest
> Running org.apache.beam.sdk.transforms.CombineTest
> Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.883
> sec - in org.apache.beam.sdk.transforms.CombineTest
> Running org.apache.beam.sdk.transforms.KvSwapTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.219 sec
> - in org.apache.beam.sdk.transforms.KvSwapTest
> Running org.apache.beam.sdk.util.ReshuffleTest
> Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.581 sec
> - in org.apache.beam.sdk.util.ReshuffleTest
>
> Results :
>
> Tests run: 168, Failures: 0, Errors: 0, Skipped: 1
>
> [JENKINS] Recording test results
> [INFO]
> [INFO] --- maven-surefire-plugin:2.18.1:test
> (streaming-runnable-on-service-tests) @ beam-runners-flink_2.10 ---
> [INFO] Tests are skipped.
> [INFO]
> [INFO] --- apache-rat-plugin:0.11:check (default) @
> beam-runners-flink_2.10 ---
> [INFO] 51 implicit excludes (use -debug for more details).
> [INFO] Exclude: **/target/**/*
> [INFO] Exclude: **/dependency-reduced-pom.xml
> [INFO] Exclude: .github/**/*
> [INFO] Exclude: **/*.iml
> [INFO] Exclude: **/package-list
> [INFO] Exclude: **/user.avsc
> [INFO] Exclude: **/test/resources/**/*.txt
> [INFO] Exclude: **/test/**/.placeholder
> [INFO] Exclude: .repository/**/*
> [INFO] 74 resources included (use -debug for more details)
> [INFO] Rat check: Summary of files. Unapproved: 0 unknown: 0 generated: 0
> approved: 74 licence.
> [INFO]
> [INFO] --- maven-failsafe-plugin:2.19.1:verify (default) @
> beam-runners-flink_2.10 ---
> [INFO] Failsafe report directory: <
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/runners/flink/runner/target/failsafe-reports
> >
> [JENKINS] Recording test results
> [INFO]
> [INFO] --- maven-install-plugin:2.5.2:install (default-install) @
> beam-runners-flink_2.10 ---
> [INFO] Installing <
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/runners/flink/runner/target/beam-runners-flink_2.10-0.2.0-incubating-SNAPSHOT.jar>
> to <
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/.repository/org/apache/beam/beam-runners-flink_2.10/0.2.0-incubating-SNAPSHOT/beam-runners-flink_2.10-0.2.0-incubating-SNAPSHOT.jar
> >
> [INFO] Installing <
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/runners/flink/runner/pom.xml>
> to <
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/.repository/org/apache/beam/beam-runners-flink_2.10/0.2.0-incubating-SNAPSHOT/beam-runners-flink_2.10-0.2.0-incubating-SNAPSHOT.pom
> >
> [INFO] Installing <
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/runners/flink/runner/target/beam-runners-flink_2.10-0.2.0-incubating-SNAPSHOT-tests.jar>
> to <
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/.repository/org/apache/beam/beam-runners-flink_2.10/0.2.0-incubating-SNAPSHOT/beam-runners-flink_2.10-0.2.0-incubating-SNAPSHOT-tests.jar
> >
> [INFO] Installing <
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/runners/flink/runner/target/beam-runners-flink_2.10-0.2.0-incubating-SNAPSHOT-sources.jar>
> to <
> https://builds.apache.org/job/beam_Rele

[RESULT] [VOTE] Release version 0.1.0-incubating

2016-06-11 Thread Davor Bonaci
I'm happy to announce that we have unanimously approved this release.

There are 10 approving votes, 9 of which are binding:
* Davor Bonaci
* Robert Bradshaw
* Ben Chambers
* Dan Halperin
* Kenneth Knowles
* Aljoscha Krettek
* James Malone
* Jean-Baptiste Onofré
* Amit Sela
* Scott Wegner

There are no disapproving votes.

At this point, this proposal will be presented to the Apache Incubator for
their review.

Thanks everyone! Personally, I'm super excited to see our first release
getting so close!

Davor

-- Forwarded message ------
From: Davor Bonaci 
Date: Wed, Jun 8, 2016 at 4:20 PM
Subject: [VOTE] Release version 0.1.0-incubating
To: dev@beam.incubator.apache.org


Hi everyone,
Here's the first vote for the first release of Apache Beam -- version
0.1.0-incubating!

As a reminder, we aren't looking for any specific new functionality, but
would like to release the existing code, get something to our users' hands,
and test the processes. Previous discussions and iterations on this release
have been archived on the dev@ mailing list.

The complete staging area is available for your review, which includes:
* the official Apache source release to be deployed to dist.apache.org [1],
and
* all artifacts to be deployed to the Maven Central Repository [2].

This corresponds to the tag "v0.1.0-incubating-RC3" in source control, [3].

Please vote as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

For those of us enjoying our first voting experience -- the release
checklist is here [4]. This is a "package release"-type of the Apache
voting process [5]. As customary, the vote will be open for 72 hours. It is
adopted by majority approval with at least 3 PPMC affirmative votes. If
approved, the proposal will be presented to the Apache Incubator for their
review.

Thanks,
Davor

[1]
https://repository.apache.org/content/repositories/orgapachebeam-1002/org/apache/beam/beam-parent/0.1.0-incubating/beam-parent-0.1.0-incubating-source-release.zip
[2] https://repository.apache.org/content/repositories/orgapachebeam-1002/
[3] https://github.com/apache/incubator-beam/tree/v0.1.0-incubating-RC3
[4] http://incubator.apache.org/guides/releasemanagement.html#check-list
[5] http://www.apache.org/foundation/voting.html


Re: [VOTE] Release version 0.1.0-incubating

2016-06-11 Thread Davor Bonaci
This vote is now complete. We'll summarize the results and next steps in
the [RESULT] thread.

Thanks everyone!

On Thu, Jun 9, 2016 at 11:29 AM, Robert Bradshaw <
rober...@google.com.invalid> wrote:

> +1 (binding)
>
> I also spot-checked the signatures, they look good.
>
> On Thu, Jun 9, 2016 at 10:32 AM, James Malone
>  wrote:
> > +1 (binding)
> >
> > On Thu, Jun 9, 2016 at 10:15 AM, Aljoscha Krettek 
> > wrote:
> >
> >> +1 (binding)
> >>
> >> I ran "mvn clean verify" on the source package, executed WordCount using
> >> the FlinkPipelineRunner. NOTICE, LICENSE and DISCLAIMER also look good
> >>
> >> On Thu, 9 Jun 2016 at 18:50 Dan Halperin 
> >> wrote:
> >>
> >> > +1 (binding)
> >> >
> >> > per checklist 2.1, I decompressed the source-release zip and ensured
> that
> >> > `mvn clean verify` passed. per 3.6, I confirmed that there are no
> binary
> >> > files. I also did a few other miscellaneous checks.
> >> >
> >> > On Thu, Jun 9, 2016 at 8:48 AM, Kenneth Knowles
> 
> >> > wrote:
> >> >
> >> > > +1 (binding)
> >> > >
> >> > > Confirmed that we can run pipelines on Dataflow.
> >> > >
> >> > > Looks good. Very exciting!
> >> > >
> >> > >
> >> > > On Thu, Jun 9, 2016 at 8:16 AM, Jean-Baptiste Onofré <
> j...@nanthrax.net>
> >> > > wrote:
> >> > >
> >> > > > Team work ! Special thanks to Davor and Dan ! And thanks to the
> >> entire
> >> > > > team: it's a major step forward (the first release is always the
> >> > hardest
> >> > > > one ;)). Let's see how the release will be taken by the IPMC :)
> >> > > >
> >> > > > Regards
> >> > > > JB
> >> > > >
> >> > > >
> >> > > > On 06/09/2016 04:32 PM, Scott Wegner wrote:
> >> > > >
> >> > > >> +1
> >> > > >>
> >> > > >> Thanks JB and Davor for all your hard work putting together this
> >> > > release!
> >> > > >>
> >> > > >> On Wed, Jun 8, 2016, 11:02 PM Jean-Baptiste Onofré <
> j...@nanthrax.net
> >> >
> >> > > >> wrote:
> >> > > >>
> >> > > >> By the way, I forgot to mention that we will create a
> >> 0.1.0-incubating
> >> > > >>> tag (kind of alias to RC3) when the vote passed.
> >> > > >>>
> >> > > >>> Regards
> >> > > >>> JB
> >> > > >>>
> >> > > >>> On 06/09/2016 01:20 AM, Davor Bonaci wrote:
> >> > > >>>
> >> > > >>>> Hi everyone,
> >> > > >>>> Here's the first vote for the first release of Apache Beam --
> >> > version
> >> > > >>>> 0.1.0-incubating!
> >> > > >>>>
> >> > > >>>> As a reminder, we aren't looking for any specific new
> >> functionality,
> >> > > but
> >> > > >>>> would like to release the existing code, get something to our
> >> users'
> >> > > >>>>
> >> > > >>> hands,
> >> > > >>>
> >> > > >>>> and test the processes. Previous discussions and iterations on
> >> this
> >> > > >>>>
> >> > > >>> release
> >> > > >>>
> >> > > >>>> have been archived on the dev@ mailing list.
> >> > > >>>>
> >> > > >>>> The complete staging area is available for your review, which
> >> > > includes:
> >> > > >>>> * the official Apache source release to be deployed to
> >> > > dist.apache.org
> >> > > >>>>
> >> > > >>> [1],
> >> > > >>>
> >> > > >>>> and
> >> > > >>>> * all artifacts to be deployed to the Maven Central Repository
> >> [2].
> >> > > >>>>
> >> > > >>>> This corresponds to the tag "v0.1.0-incubating-RC3" in source
> >>

[VOTE] Release version 0.1.0-incubating

2016-06-08 Thread Davor Bonaci
Hi everyone,
Here's the first vote for the first release of Apache Beam -- version
0.1.0-incubating!

As a reminder, we aren't looking for any specific new functionality, but
would like to release the existing code, get something to our users' hands,
and test the processes. Previous discussions and iterations on this release
have been archived on the dev@ mailing list.

The complete staging area is available for your review, which includes:
* the official Apache source release to be deployed to dist.apache.org [1],
and
* all artifacts to be deployed to the Maven Central Repository [2].

This corresponds to the tag "v0.1.0-incubating-RC3" in source control, [3].

Please vote as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

For those of us enjoying our first voting experience -- the release
checklist is here [4]. This is a "package release"-type of the Apache
voting process [5]. As customary, the vote will be open for 72 hours. It is
adopted by majority approval with at least 3 PPMC affirmative votes. If
approved, the proposal will be presented to the Apache Incubator for their
review.

Thanks,
Davor

[1]
https://repository.apache.org/content/repositories/orgapachebeam-1002/org/apache/beam/beam-parent/0.1.0-incubating/beam-parent-0.1.0-incubating-source-release.zip
[2] https://repository.apache.org/content/repositories/orgapachebeam-1002/
[3] https://github.com/apache/incubator-beam/tree/v0.1.0-incubating-RC3
[4] http://incubator.apache.org/guides/releasemanagement.html#check-list
[5] http://www.apache.org/foundation/voting.html


Re: 0.1.0-incubating release

2016-06-08 Thread Davor Bonaci
The third release candidate is now available for everyone's review [1],
which should be incorporating all feedback so far.

Please comment if there's additional feedback, as we are about to start the
voting process.

[1] https://repository.apache.org/content/repositories/orgapachebeam-1002

On Wed, Jun 8, 2016 at 12:10 PM, P. Taylor Goetz  wrote:

> Thanks for the clarification JB. In the projects I’ve been involved with,
> I’ve not seen that practice.
>
> As long as the resulting release ends up on dist.a.o I don’t think it’s a
> problem.
>
> -Taylor
>
>
> > On Jun 8, 2016, at 12:49 AM, Jean-Baptiste Onofré 
> wrote:
> >
> > Hi Taylor,
> >
> > Just to be clearn, in most other projects, we stage the distributions on
> repository. We upload the distro and signatures to dist.apache.org only
> when the vote passed.
> >
> > Basically, the release process I talked with Davor (and that I will
> document) is:
> > - Tag and stage using mvn release:prepare release:perform
> > - Close repo
> > - Start vote
> > - If passed, forward vote to incubator
> > - If passed, close repo
> > - Upload distro to dist
> > - Announce the release (mailing lists, website)
> >
> > It's based on what I do in Karaf, ServiceMix, etc.
> >
> > Regards
> > JB
> >
> > On 06/08/2016 02:39 AM, P. Taylor Goetz wrote:
> >> Out of curiosity, is there a reason for distributing the release on
> repository.a.o vs. dist.a.o?
> >>
> >> In my experience repository.a.o has traditionally been used for maven
> artifacts, and dist.a.o has been for release artifacts (source archives and
> convenience binaries).
> >>
> >> I'd be happy to help with documenting the process.
> >>
> >> I ask because this might come up during an IPMC release vote.
> >>
> >> -Taylor
> >>
> >>> On Jun 1, 2016, at 9:46 PM, Davor Bonaci 
> wrote:
> >>>
> >>> Hi everyone!
> >>> We've started the release process for our first release,
> 0.1.0-incubating.
> >>>
> >>> To recap previous discussions, we don't have particular functional
> goals
> >>> for this release. Instead, we'd like to make available what's
> currently in
> >>> the repository, as well as work through the release process.
> >>>
> >>> With this in mind, we've:
> >>> * branched off the release branch [1] at master's commit 8485272,
> >>> * updated master to prepare for the second release, 0.2.0-incubating,
> >>> * built the first release candidate, RC1, and deployed it to a staging
> >>> repository [2].
> >>>
> >>> We are not ready to start a vote just yet -- we've already identified
> a few
> >>> issues worth fixing. That said, I'd like to invite everybody to take a
> peek
> >>> and comment. I'm hoping we can address as many issues as possible
> before we
> >>> start the voting process.
> >>>
> >>> Please let us know if you see any issues.
> >>>
> >>> Thanks,
> >>> Davor
> >>>
> >>> [1]
> https://github.com/apache/incubator-beam/tree/release-0.1.0-incubating
> >>> [2]
> https://repository.apache.org/content/repositories/orgapachebeam-1000/
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
>
>


Re: 0.1.0-incubating release

2016-06-07 Thread Davor Bonaci
The second release candidate is available for everyone's review [1].

We plan to call for a vote shortly; please comment if there's any
additional feedback.

[1] https://repository.apache.org/content/repositories/orgapachebeam-1001

On Tue, Jun 7, 2016 at 9:33 AM, Kenneth Knowles 
wrote:

> +1
>
> Lovely. Very readable.
>
> The "-parent" artifacts are just leaked implementation details of our build
> configuration that no one should ever actually reference, right?
>
> Kenn
>
> On Tue, Jun 7, 2016 at 8:54 AM, Dan Halperin 
> wrote:
>
> > +2! This seems most concordant with other Apache products and the most
> > future-proof.
> >
> > On Mon, Jun 6, 2016 at 9:35 PM, Jean-Baptiste Onofré 
> > wrote:
> >
> > > +1
> > >
> > > Regards
> > > JB
> > >
> > >
> > > On 06/07/2016 02:49 AM, Davor Bonaci wrote:
> > >
> > >> After a few rounds of discussions and examining patterns of other
> > >> projects,
> > >> I think we are converging towards:
> > >>
> > >> * A flat group structure, where all artifacts belong to the
> > >> org.apache.beam
> > >> group.
> > >> * Prefix all artifact ids with "beam-".
> > >> * Name artifacts according to the existing directory/module layout:
> > >> beam-sdks-java-core, beam-runners-google-cloud-dataflow-java, etc.
> > >> * Suffix all parents with "-parent", e.g., "beam-parent",
> > >> "sdks-java-parent", etc.
> > >> * Create a "distributions" module, for the purpose of packaging the
> > source
> > >> code for the ASF release.
> > >>
> > >> I believe this approach takes into account everybody's feedback so
> far,
> > >> and
> > >> I think opposing positions have been retracted.
> > >>
> > >> Please comment if that's not the case, or if there are any additional
> > >> points that we may have missed. If not, this is implemented in pending
> > >> pull
> > >> requests #420 and #423.
> > >>
> > >> Thanks!
> > >>
> > >> On Fri, Jun 3, 2016 at 9:59 AM, Thomas Weise 
> > >> wrote:
> > >>
> > >> Another consideration for potential future packaging/distribution
> > >>> solutions
> > >>> is how the artifacts line up as files in a flat directory. For that
> it
> > >>> may
> > >>> be good to have a common prefix in the artifactId and unique
> > artifactId.
> > >>>
> > >>> The name for the source archive (when relying on ASF parent POM) can
> > also
> > >>> be controlled without expanding the artifactId:
> > >>>
> > >>>   
> > >>>  
> > >>>
> > >>>  maven-assembly-plugin
> > >>>  
> > >>>apache-beam
> > >>>  
> > >>>
> > >>>  
> > >>>   
> > >>>
> > >>> Thanks,
> > >>> Thomas
> > >>>
> > >>> On Fri, Jun 3, 2016 at 9:39 AM, Davor Bonaci
>  > >
> > >>> wrote:
> > >>>
> > >>> BEAM-315 is definitely important. Normally, I'd always advocate for
> > >>>>
> > >>> holding
> > >>>
> > >>>> the release to pick that fix. For the very first release, however,
> I'd
> > >>>> prefer to proceed to get something out there and test the process.
> As
> > >>>> you
> > >>>> said, we can address this rather quickly once we have the fix merged
> > in.
> > >>>>
> > >>>> In terms of Maven coordinates, there are two basic approaches:
> > >>>> * flat structure, where artifacts live under "org.apache.beam" group
> > and
> > >>>> are differentiated by their artifact id.
> > >>>> * hierarchical structure, where we use different groups for
> different
> > >>>>
> > >>> types
> > >>>
> > >>>> of artifacts (org.apache.beam.sdks; org.apache.beam.runners).
> > >>>>
> > >>>> There are pros and cons on the both sides of the argument. Different
> > >>>> projects made different choices.

Re: 0.1.0-incubating release

2016-06-06 Thread Davor Bonaci
After a few rounds of discussions and examining patterns of other projects,
I think we are converging towards:

* A flat group structure, where all artifacts belong to the org.apache.beam
group.
* Prefix all artifact ids with "beam-".
* Name artifacts according to the existing directory/module layout:
beam-sdks-java-core, beam-runners-google-cloud-dataflow-java, etc.
* Suffix all parents with "-parent", e.g., "beam-parent",
"sdks-java-parent", etc.
* Create a "distributions" module, for the purpose of packaging the source
code for the ASF release.

I believe this approach takes into account everybody's feedback so far, and
I think opposing positions have been retracted.

Please comment if that's not the case, or if there are any additional
points that we may have missed. If not, this is implemented in pending pull
requests #420 and #423.

Thanks!

On Fri, Jun 3, 2016 at 9:59 AM, Thomas Weise  wrote:

> Another consideration for potential future packaging/distribution solutions
> is how the artifacts line up as files in a flat directory. For that it may
> be good to have a common prefix in the artifactId and unique artifactId.
>
> The name for the source archive (when relying on ASF parent POM) can also
> be controlled without expanding the artifactId:
>
>  
> 
>   
> maven-assembly-plugin
> 
>   apache-beam
>     
>   
> 
>  
>
> Thanks,
> Thomas
>
> On Fri, Jun 3, 2016 at 9:39 AM, Davor Bonaci 
> wrote:
>
> > BEAM-315 is definitely important. Normally, I'd always advocate for
> holding
> > the release to pick that fix. For the very first release, however, I'd
> > prefer to proceed to get something out there and test the process. As you
> > said, we can address this rather quickly once we have the fix merged in.
> >
> > In terms of Maven coordinates, there are two basic approaches:
> > * flat structure, where artifacts live under "org.apache.beam" group and
> > are differentiated by their artifact id.
> > * hierarchical structure, where we use different groups for different
> types
> > of artifacts (org.apache.beam.sdks; org.apache.beam.runners).
> >
> > There are pros and cons on the both sides of the argument. Different
> > projects made different choices. Flat structure is easier to find and
> > navigate, but often breaks down with too many artifacts. Hierarchical
> > structure is just the opposite.
> >
> > On my end, the only important thing is consistency. We used to have it,
> and
> > it got broken by PR #365. This part should be fixed -- we should either
> > finish the vision of the hierarchical structure, or rollback that PR to
> get
> > back to a fully flat structure.
> >
> > My general biases tend to be:
> > * hierarchical structure, since we have many artifacts already.
> > * short identifiers; no need to repeat a part of the group id in the
> > artifact id.
> >
> > On Fri, Jun 3, 2016 at 4:03 AM, Jean-Baptiste Onofré 
> > wrote:
> >
> > > Hi Max,
> > >
> > > I discussed with Davor yesterday. Basically, I proposed:
> > >
> > > 1. To rename all parent with a prefix (beam-parent,
> flink-runner-parent,
> > > spark-runner-parent, etc).
> > > 2. For the groupId, I prefer to use different groupId, it's clearer to
> > me,
> > > and it's exactly the usage of the groupId. Some projects use a single
> > > groupId (spark, hadoop, etc), others use multiple (camel, karaf,
> > activemq,
> > > etc). I prefer different groupIds but ok to go back to single one.
> > >
> > > Anyway, I'm preparing a PR to introduce a new Maven module:
> > > "distribution". The purpose is to address both BEAM-319 (first) and
> > > BEAM-320 (later). It's where we will be able to define the different
> > > distributions we plan to publish (source and binaries).
> > >
> > > Regards
> > > JB
> > >
> > >
> > > On 06/03/2016 11:02 AM, Maximilian Michels wrote:
> > >
> > >> Thanks for getting us ready for the first release, Davor! We would
> > >> like to fix BEAM-315 next week. Is there already a timeline for the
> > >> first release? If so, we could also address this in a minor release.
> > >> Releasing often will give us some experience with our release process
> > >> :)
> > >>
> > >> I would like everyone to think about the artifact names and group ids
> > >> again. &

Re: Apache Beam for Python

2016-06-03 Thread Davor Bonaci
Welcome Python SDK, as well as Silviu, Charles, Ahmet and Chamikara!

On Fri, Jun 3, 2016 at 7:07 AM, Jean-Baptiste Onofré 
wrote:

> Absolutely ;)
>
>
> On 06/03/2016 03:51 PM, James Malone wrote:
>
>> Hey Silviu!
>>
>> I think JB is proposing we create a python directory in the sdks directory
>> in the root repository (and modify the configuration files accordingly):
>>
>> https://github.com/apache/incubator-beam/tree/master/sdks
>>
>> This Beam document here titled "Apache Beam (Incubating): Repository
>> Structure" details the proposed repository structure and may be useful:
>>
>>
>>
>> https://drive.google.com/a/google.com/folderview?id=0B-IhJZh9Ab52OFBVZHpsNjc4eXc
>>
>> Best,
>>
>> James
>>
>>
>>
>> On Fri, Jun 3, 2016 at 6:34 AM, Silviu Calinoiu
>> 
>> wrote:
>>
>> Hi JB,
>>> Thanks for the welcome! I come from the Python land so  I am not quite
>>> familiar with Maven. What do you mean by a Maven module? You mean an
>>> artifact so you can install things? In Python, people are used to
>>> packages
>>> downloaded from PyPI (pypi.python.org -- which is sort of Maven for
>>> Python). Whatever is the standard way of doing things in Apache we'll do
>>> it. Just asking for clarifications.
>>>
>>> By the way this discussion is very useful since we will have to iron out
>>> several details like this.
>>> Thanks,
>>> Silviu
>>>
>>> On Fri, Jun 3, 2016 at 6:19 AM, Jean-Baptiste Onofré 
>>> wrote:
>>>
>>> Hi Silviu,

 thanks for detailed update and great work !

 I would advice to create a:

 sdks/python

 Maven module to store the Python SDK.

 WDYT ?

 By the way, welcome aboard and great to have you all guys in the team !

 Regards
 JB

 On 06/03/2016 03:13 PM, Silviu Calinoiu wrote:

 Hi all,
>
> My name is Silviu Calinoiu and I am a member of the Cloud Dataflow team
> working on the Python SDK.  As the original Beam proposal (
> https://wiki.apache.org/incubator/BeamProposal) mentioned, we have
> been
> planning to merge the Python SDK into Beam. The Python SDK is in an
>
 early
>>>
 stage of development (alpha milestone) and so this is a good time to
>
 move
>>>
 the code without causing too much disruption to our customers.
> Additionally, this enables the Beam community to contribute as soon as
> possible.
>
> The current state of the SDK is as follows:
>
>  -
>
>  Open-sourced at
> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/
>
>
>  -
>
>  Model: All main concepts are present.
>  -
>
>  I/O: SDK supports text (Google Cloud Storage) and BigQuery
>
 connectors
>>>
  and has a framework for adding additional sources and sinks.
>  -
>
>  Runners: SDK has two pipeline runners: direct runner (in process,
> local
>  execution) and Cloud Dataflow runner for batch pipelines (submit
> job
> to
>  Google Dataflow service). The current direct runner is bounded
> only
> (batch
>  execution) but there is work in progress to support unbounded (as
> in
> Java).
>  -
>
>  Testing: The code base has unit test coverage for all the modules
>
 and
>>>
  several integration and end to end tests (similar in coverage to
> the
> Java
>  SDK). Streaming is not well tested end to end yet since Cloud
>
 Dataflow
>>>
  focused first on batch.
>  -
>
>  Docs: We have matching Python documentation for the features
>
 currently
>>>
  supported by Cloud Dataflow. The docs are on cloud.google.com
>
 (access
>>>
  only by whitelist due to the alpha stage of the project). Devin is
> working
>  on the transition of all docs to Apache.
>
>
> In the next days/weeks we would like to prepare and start migrating the
> code and you should start seeing some pull requests. We also hope that
>
 the
>>>
 Beam community will shape the SDK going forward. In particular, all the
> model improvements implemented for Java (Runner API, etc.) will have
> equivalents in Python once they stabilize. If you have any advice
> before
> we
> start the journey please let us know.
>
> The team that will join the Beam effort consists of me (Silviu
>
 Calinoiu),
>>>
 Charles Chen, Ahmet Altay, Chamikara Jayalath, and last but not least
> Robert Bradshaw (who is already an Apache Beam committer).
>
> So let us know what you think!
>
> Best regards,
>
> Silviu
>
>
> --
 Jean-Baptiste Onofré
 jbono...@apache.org
 http://blog.nanthrax.net
 Talend - http://www.talend.com


>>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: [VOTE] groupId/artifactId naming & layout

2016-06-03 Thread Davor Bonaci
This is not a great vote proposal for several reasons:
* "Use the current layout" is ambiguous, because it is inconsistent (it is
now partly flat and party hierarchical).
* Getting the outcome won't move us much closer to the resolution, given
that there are several sub-variants in each option.
* We have not laid out advantages, disadvantages, and consequences of each
option for everyone to make an informed decision.
* It is premature: we haven't tried to reach a consensus or explored
alternatives. 3 hours and just a few emails is way too short from a issue
being raised to vote call.

I'd suggest to try to find a consensus on the original thread first, and
call for a vote if/when needed.

On Fri, Jun 3, 2016 at 5:15 AM, Amit Sela  wrote:

> +1 for Option2
>
> On Fri, Jun 3, 2016 at 2:09 PM Jean-Baptiste Onofré 
> wrote:
>
> > As said in my previous e-mail, just proposed PR #416.
> >
> > Let's start a vote for groupId and artifactId naming.
> >
> > [ ] Option1: use the current layout (multiple groupId, artifactId
> > relative to groupId)
> > [ ] Option2: use unique org.apache.beam groupId and rename artifactId
> > with a prefix (beam-parent/apache-beam, flink-runner, spark-runner, etc)
> >
> > Regards
> > JB
> >
> > On 06/03/2016 01:03 PM, Jean-Baptiste Onofré wrote:
> > > Hi Max,
> > >
> > > I discussed with Davor yesterday. Basically, I proposed:
> > >
> > > 1. To rename all parent with a prefix (beam-parent,
> flink-runner-parent,
> > > spark-runner-parent, etc).
> > > 2. For the groupId, I prefer to use different groupId, it's clearer to
> > > me, and it's exactly the usage of the groupId. Some projects use a
> > > single groupId (spark, hadoop, etc), others use multiple (camel, karaf,
> > > activemq, etc). I prefer different groupIds but ok to go back to single
> > > one.
> > >
> > > Anyway, I'm preparing a PR to introduce a new Maven module:
> > > "distribution". The purpose is to address both BEAM-319 (first) and
> > > BEAM-320 (later). It's where we will be able to define the different
> > > distributions we plan to publish (source and binaries).
> > >
> > > Regards
> > > JB
> > >
> > > On 06/03/2016 11:02 AM, Maximilian Michels wrote:
> > >> Thanks for getting us ready for the first release, Davor! We would
> > >> like to fix BEAM-315 next week. Is there already a timeline for the
> > >> first release? If so, we could also address this in a minor release.
> > >> Releasing often will give us some experience with our release process
> > >> :)
> > >>
> > >> I would like everyone to think about the artifact names and group ids
> > >> again. "parent" and "flink" are not very suitable names for the Beam
> > >> parent or the Flink Runner artifact (same goes for the Spark Runner).
> > >> I'd prefer "beam-parent", "flink-runner", and "spark-runner" as
> > >> artifact ids.
> > >>
> > >> One might think of Maven GroupIds as a sort of hierarchy but they're
> > >> not. They're just an identifier. Renaming the parent pom to
> > >> "apache-beam" or "beam-parent" would give us the old naming scheme
> > >> which used flat group ids (before [1]).
> > >>
> > >> In the end, I guess it doesn't matter too much if we document the
> > >> naming schemes accordingly. What matters is that we use a consistent
> > >> naming scheme.
> > >>
> > >> Cheers,
> > >> Max
> > >>
> > >> [1] https://issues.apache.org/jira/browse/BEAM-287
> > >>
> > >>
> > >> On Thu, Jun 2, 2016 at 4:00 PM, Jean-Baptiste Onofré  >
> > >> wrote:
> > >>> Actually, I think we can fix both issue in one commit.
> > >>>
> > >>> What do you think about renaming the main parent POM with:
> > >>> groupId: org.apache.beam
> > >>> artifactId: apache-beam
> > >>>
> > >>> ?
> > >>>
> > >>> Thanks to that, the source distribution will be named
> > >>> apache-beam-xxx-sources.zip and it would be clearer to dev.
> > >>>
> > >>> Thoughts ?
> > >>>
> > >>> Regards
> > >>> JB
> > >>>
> > >>>
> > >>> On 06/02/2016 03:10 PM, Jean-Bapt

Re: 0.1.0-incubating release

2016-06-03 Thread Davor Bonaci
BEAM-315 is definitely important. Normally, I'd always advocate for holding
the release to pick that fix. For the very first release, however, I'd
prefer to proceed to get something out there and test the process. As you
said, we can address this rather quickly once we have the fix merged in.

In terms of Maven coordinates, there are two basic approaches:
* flat structure, where artifacts live under "org.apache.beam" group and
are differentiated by their artifact id.
* hierarchical structure, where we use different groups for different types
of artifacts (org.apache.beam.sdks; org.apache.beam.runners).

There are pros and cons on the both sides of the argument. Different
projects made different choices. Flat structure is easier to find and
navigate, but often breaks down with too many artifacts. Hierarchical
structure is just the opposite.

On my end, the only important thing is consistency. We used to have it, and
it got broken by PR #365. This part should be fixed -- we should either
finish the vision of the hierarchical structure, or rollback that PR to get
back to a fully flat structure.

My general biases tend to be:
* hierarchical structure, since we have many artifacts already.
* short identifiers; no need to repeat a part of the group id in the
artifact id.

On Fri, Jun 3, 2016 at 4:03 AM, Jean-Baptiste Onofré 
wrote:

> Hi Max,
>
> I discussed with Davor yesterday. Basically, I proposed:
>
> 1. To rename all parent with a prefix (beam-parent, flink-runner-parent,
> spark-runner-parent, etc).
> 2. For the groupId, I prefer to use different groupId, it's clearer to me,
> and it's exactly the usage of the groupId. Some projects use a single
> groupId (spark, hadoop, etc), others use multiple (camel, karaf, activemq,
> etc). I prefer different groupIds but ok to go back to single one.
>
> Anyway, I'm preparing a PR to introduce a new Maven module:
> "distribution". The purpose is to address both BEAM-319 (first) and
> BEAM-320 (later). It's where we will be able to define the different
> distributions we plan to publish (source and binaries).
>
> Regards
> JB
>
>
> On 06/03/2016 11:02 AM, Maximilian Michels wrote:
>
>> Thanks for getting us ready for the first release, Davor! We would
>> like to fix BEAM-315 next week. Is there already a timeline for the
>> first release? If so, we could also address this in a minor release.
>> Releasing often will give us some experience with our release process
>> :)
>>
>> I would like everyone to think about the artifact names and group ids
>> again. "parent" and "flink" are not very suitable names for the Beam
>> parent or the Flink Runner artifact (same goes for the Spark Runner).
>> I'd prefer "beam-parent", "flink-runner", and "spark-runner" as
>> artifact ids.
>>
>> One might think of Maven GroupIds as a sort of hierarchy but they're
>> not. They're just an identifier. Renaming the parent pom to
>> "apache-beam" or "beam-parent" would give us the old naming scheme
>> which used flat group ids (before [1]).
>>
>> In the end, I guess it doesn't matter too much if we document the
>> naming schemes accordingly. What matters is that we use a consistent
>> naming scheme.
>>
>> Cheers,
>> Max
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-287
>>
>>
>> On Thu, Jun 2, 2016 at 4:00 PM, Jean-Baptiste Onofré 
>> wrote:
>>
>>> Actually, I think we can fix both issue in one commit.
>>>
>>> What do you think about renaming the main parent POM with:
>>> groupId: org.apache.beam
>>> artifactId: apache-beam
>>>
>>> ?
>>>
>>> Thanks to that, the source distribution will be named
>>> apache-beam-xxx-sources.zip and it would be clearer to dev.
>>>
>>> Thoughts ?
>>>
>>> Regards
>>> JB
>>>
>>>
>>> On 06/02/2016 03:10 PM, Jean-Baptiste Onofré wrote:
>>>
>>>>
>>>> Another annoying thing is the main parent POM artifactId.
>>>>
>>>> Now, it's just "parent". What do you think about renaming to
>>>> "beam-parent" ?
>>>>
>>>> Regarding the source distribution name, I would cancel this staging to
>>>> fix that (I will have a PR ready soon).
>>>>
>>>> Thoughts ?
>>>>
>>>> Regards
>>>> JB
>>>>
>>>> On 06/02/2016 03:46 AM, Davor Bonaci wrote:
>>>>
>>>>>
>>>>> Hi everyone!
>>>>> We&

Re: Build failed in Jenkins: beam_Release_NightlySnapshot #60

2016-06-02 Thread Davor Bonaci
New type of error; investigation in progress.

On Thu, Jun 2, 2016 at 12:30 AM, Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> See  >
>
> Changes:
>
> [aljoscha.krettek] [BEAM-295] Remove erroneous close() calls in Flink
> Create Sources
>
> [bchambers] Forward port changes to GC holds
>
> [davor] Update pom.xml files formatting
>
> [davor] [maven-release-plugin] prepare branch release-0.1.0-incubating
>
> [davor] [maven-release-plugin] prepare for next development iteration
>
> [dhalperi] Use Structural Value keys instead of User Values
>
> --
> [...truncated 5106 lines...]
> [WARNING] Entry:
> apache-beam-0.2.0-incubating-SNAPSHOT/core/src/main/java/org/apache/beam/sdk/util/ActiveWindowSet.java
> longer than 100 characters.
> [WARNING] Entry:
> apache-beam-0.2.0-incubating-SNAPSHOT/core/src/main/java/org/apache/beam/sdk/util/MapAggregatorValues.java
> longer than 100 characters.
> [WARNING] Entry:
> apache-beam-0.2.0-incubating-SNAPSHOT/core/src/main/java/org/apache/beam/sdk/util/DoFnRunnerBase.java
> longer than 100 characters.
> [WARNING] Entry:
> apache-beam-0.2.0-incubating-SNAPSHOT/core/src/main/java/org/apache/beam/sdk/util/FileIOChannelFactory.java
> longer than 100 characters.
> [WARNING] Entry:
> apache-beam-0.2.0-incubating-SNAPSHOT/core/src/main/java/org/apache/beam/sdk/util/MergingActiveWindowSet.java
> longer than 100 characters.
> [WARNING] Entry:
> apache-beam-0.2.0-incubating-SNAPSHOT/core/src/main/java/org/apache/beam/sdk/util/gcsfs/package-info.java
> longer than 100 characters.
> [WARNING] Entry:
> apache-beam-0.2.0-incubating-SNAPSHOT/core/src/main/java/org/apache/beam/sdk/util/gcsfs/GcsPath.java
> longer than 100 characters.
> [WARNING] Entry:
> apache-beam-0.2.0-incubating-SNAPSHOT/core/src/main/java/org/apache/beam/sdk/util/PropertyNames.java
> longer than 100 characters.
> [WARNING] Entry:
> apache-beam-0.2.0-incubating-SNAPSHOT/core/src/main/java/org/apache/beam/sdk/util/WeightedValue.java
> longer than 100 characters.
> [WARNING] Entry:
> apache-beam-0.2.0-incubating-SNAPSHOT/core/src/main/java/org/apache/beam/sdk/util/CombineContextFactory.java
> longer than 100 characters.
> [WARNING] Entry:
> apache-beam-0.2.0-incubating-SNAPSHOT/core/src/main/java/org/apache/beam/sdk/util/AttemptBoundedExponentialBackOff.java
> longer than 100 characters.
> [WARNING] Entry:
> apache-beam-0.2.0-incubating-SNAPSHOT/core/src/main/java/org/apache/beam/sdk/util/ShardingWritableByteChannel.java
> longer than 100 characters.
> [WARNING] Entry:
> apache-beam-0.2.0-incubating-SNAPSHOT/core/src/main/java/org/apache/beam/sdk/util/FinishedTriggersBitSet.java
> longer than 100 characters.
> [WARNING] Entry:
> apache-beam-0.2.0-incubating-SNAPSHOT/core/src/main/java/org/apache/beam/sdk/util/ExposedByteArrayInputStream.java
> longer than 100 characters.
> [WARNING] Entry:
> apache-beam-0.2.0-incubating-SNAPSHOT/core/src/main/java/org/apache/beam/sdk/util/SystemDoFnInternal.java
> longer than 100 characters.
> [WARNING] Entry:
> apache-beam-0.2.0-incubating-SNAPSHOT/core/src/main/java/org/apache/beam/sdk/util/TriggerRunner.java
> longer than 100 characters.
> [WARNING] Entry:
> apache-beam-0.2.0-incubating-SNAPSHOT/core/src/main/java/org/apache/beam/sdk/util/IOChannelFactory.java
> longer than 100 characters.
> [WARNING] Entry:
> apache-beam-0.2.0-incubating-SNAPSHOT/core/src/main/java/org/apache/beam/sdk/util/AssignWindows.java
> longer than 100 characters.
> [WARNING] Entry:
> apache-beam-0.2.0-incubating-SNAPSHOT/core/src/main/java/org/apache/beam/sdk/util/NonMergingActiveWindowSet.java
> longer than 100 characters.
> [WARNING] Entry:
> apache-beam-0.2.0-incubating-SNAPSHOT/core/src/main/java/org/apache/beam/sdk/util/WindowingInternals.java
> longer than 100 characters.
> [WARNING] Entry:
> apache-beam-0.2.0-incubating-SNAPSHOT/core/src/main/java/org/apache/beam/sdk/util/ExecutionContext.java
> longer than 100 characters.
> [WARNING] Entry:
> apache-beam-0.2.0-incubating-SNAPSHOT/core/src/main/java/org/apache/beam/sdk/util/DirectModeExecutionContext.java
> longer than 100 characters.
> [WARNING] Entry:
> apache-beam-0.2.0-incubating-SNAPSHOT/core/src/main/java/org/apache/beam/sdk/util/WindowTracing.java
> longer than 100 characters.
> [WARNING] Entry:
> apache-beam-0.2.0-incubating-SNAPSHOT/core/src/main/java/org/apache/beam/sdk/util/PCollectionViews.java
> longer than 100 characters.
> [WARNING] Entry:
> apache-beam-0.2.0-incubating-SNAPSHOT/core/src/main/java/org/apache/beam/sdk/util/CounterAggregator.java
> longer than 100 characters.
> [INFO] Building zip: <
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/sdks/java/target/apache-beam-0.2.0-incubating-SNAPSHOT-src.zip
> >
> [INFO]
> [INFO] --- apache-rat-plugin:0.11:check (default) @ java-sdk-parent ---
> [INFO] 51 implicit excludes (use -debug for more details).
> [INFO] Exclude: **/target/**/*
> [INFO] Exclude

0.1.0-incubating release

2016-06-01 Thread Davor Bonaci
Hi everyone!
We've started the release process for our first release, 0.1.0-incubating.

To recap previous discussions, we don't have particular functional goals
for this release. Instead, we'd like to make available what's currently in
the repository, as well as work through the release process.

With this in mind, we've:
* branched off the release branch [1] at master's commit 8485272,
* updated master to prepare for the second release, 0.2.0-incubating,
* built the first release candidate, RC1, and deployed it to a staging
repository [2].

We are not ready to start a vote just yet -- we've already identified a few
issues worth fixing. That said, I'd like to invite everybody to take a peek
and comment. I'm hoping we can address as many issues as possible before we
start the voting process.

Please let us know if you see any issues.

Thanks,
Davor

[1] https://github.com/apache/incubator-beam/tree/release-0.1.0-incubating
[2] https://repository.apache.org/content/repositories/orgapachebeam-1000/


Re: [PROPOSAL] Beam FAQ

2016-05-31 Thread Davor Bonaci
Javadoc publication should be a part of every release. As soon as the first
release is complete, Javadoc will be on our website.

On Sun, May 29, 2016 at 10:35 PM, Jean-Baptiste Onofré 
wrote:

> Thanks Devin,
>
> gonna take a look !
>
> Regards
> JB
>
>
> On 05/28/2016 02:20 AM, Devin Donnelly wrote:
>
>> The relevant file you're looking for, and the one that's constantly
>> updated, is:
>>
>> /docs/programming-guide.md
>>
>> On Fri, May 27, 2016 at 5:20 PM, Devin Donnelly 
>> wrote:
>>
>> Here's the URL of my fork, so you can see what it looks like so far:
>>>
>>> https://github.com/devin-donnelly/incubator-beam-site/tree/beam-pg
>>>
>>> On Mon, May 23, 2016 at 8:02 AM, Jean-Baptiste Onofré 
>>> wrote:
>>>
>>> Agree, it would be great to have such user guide + a started guide for
>>>> Beam.
>>>>
>>>> Regards
>>>> JB
>>>>
>>>>
>>>> On 05/23/2016 04:41 PM, Jesse Anderson wrote:
>>>>
>>>> I think Josh's Crunch User Guide is a great example of what a user guide
>>>>> should cover. https://crunch.apache.org/user-guide.html
>>>>>
>>>>> On Mon, May 23, 2016 at 2:00 AM Ismaël Mejía 
>>>>> wrote:
>>>>>
>>>>> Ok, I agree Davor for end users a getting started guide is not only
>>>>>
>>>>>> important but I would say critical at this moment, the FAQ can be an
>>>>>> effort
>>>>>> run in parallel. The project is incubating so the FAQ would be in its
>>>>>> early
>>>>>> state, and ideally we must not need an enormous FAQ, however this
>>>>>> project
>>>>>> mixes many different technologies, and I can easily imagine frequent
>>>>>> questions about technical details on Sources, Sinks, and Runners e.g.
>>>>>> my
>>>>>> question on how to reuse the context on the spark runner is a good
>>>>>> example,
>>>>>> it is not general enough to put it as a default in the runner, it is
>>>>>> not
>>>>>> simple enough for a getting started guide, but a good amount of users
>>>>>> will
>>>>>> have to deal with it once they write tests for their pipelines.
>>>>>>
>>>>>> Devin, thanks for writing, I am interested in the draft, can you
>>>>>> please
>>>>>> share the URL of your fork, so other people can eventually take a
>>>>>> look/contribute.
>>>>>>
>>>>>> Ismael
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, May 20, 2016 at 7:35 PM, Devin Donnelly <
>>>>>> ddonne...@google.com.invalid> wrote:
>>>>>>
>>>>>> FYI: User documentation draft (the Beam Programming Guide) is well
>>>>>>
>>>>>>> underway. I'm regularly pushing stuff out to a fork of the Beam
>>>>>>> website
>>>>>>> repo if anyone wants a sneak peek.
>>>>>>> On May 20, 2016 9:37 AM, "Davor Bonaci" 
>>>>>>>
>>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>> We are missing a basic getting started guide along with the rest of
>>>>>>>
>>>>>>>>
>>>>>>>> user
>>>>>>>
>>>>>>
>>>>>> documentation. I think we should work on this first.
>>>>>>>
>>>>>>>>
>>>>>>>> FAQ is a great idea for things that aren't or cannot be covered by
>>>>>>>>
>>>>>>>> those
>>>>>>>
>>>>>>
>>>>>> documents -- but, we cannot really start that before we have at least
>>>>>>> a
>>>>>>>
>>>>>>>> draft version of the previous.
>>>>>>>>
>>>>>>>> Wiki hosting would be owned by Infra, if we choose to go down that
>>>>>>>> path
>>>>>>>>
>>>>>>>> at
>>>>>>>
>>>>>>> some point.
>>>>>>>>
>>>>>>>> On Fri, May 20, 2016 at 1:34 AM, Jean-Baptiste Onofré <
>>>>>>>> j...@nanthrax.net
>>>>>>>>
>>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>>
>>>>>>>>> good idea for the FAQ. Not sure for the wiki: it would prefer kind
>>>>>>>>> of
>>>>>>>>> governance and review using the website.
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> JB
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 05/20/2016 09:24 AM, Ismaël Mejía wrote:
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I have stumbled with some issues while trying to execute pipelines
>>>>>>>>>>
>>>>>>>>>> with
>>>>>>>>>
>>>>>>>>
>>>>>>> all
>>>>>>>>
>>>>>>>>> the
>>>>>>>>>> different runners and I was wondering if we need to create a
>>>>>>>>>>
>>>>>>>>>> Frequently
>>>>>>>>>
>>>>>>>>
>>>>>>> Asked
>>>>>>>>
>>>>>>>>> Questions (FAQ) section on the website. Maybe it would be better to
>>>>>>>>>>
>>>>>>>>>> create
>>>>>>>>>
>>>>>>>>
>>>>>>>> such
>>>>>>>>>
>>>>>>>>>> thing as a wiki so we can contribute faster.
>>>>>>>>>>
>>>>>>>>>> What do you think ? And what way you think is the better to do so
>>>>>>>>>>
>>>>>>>>>> (infra)
>>>>>>>>>
>>>>>>>>
>>>>>>>> ?
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Ismaël
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>> Jean-Baptiste Onofré
>>>>>>>>> jbono...@apache.org
>>>>>>>>> http://blog.nanthrax.net
>>>>>>>>> Talend - http://www.talend.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>> --
>>>> Jean-Baptiste Onofré
>>>> jbono...@apache.org
>>>> http://blog.nanthrax.net
>>>> Talend - http://www.talend.com
>>>>
>>>>
>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: BEAM-64

2016-05-23 Thread Davor Bonaci
[ I'll reply a little bit and leave the details to Dan. ]

First, Frederick, welcome! We look forward to your contributions to Beam.

On a first glance, BEAM-64 was a little under-specified. Let me try to
clarify what was intended:
* Add a pipeline-level registry of compression formats with a corresponding
logic to compress/decompress. This is perhaps somewhat similar design to
CoderRegistry.
* Remove the current logic from CompressedSource, but keep the ability to
override the registry.
* Propagate the ability to override the registry to the users of
CompressedSource, one of which is TextIO.

>From the user perspective, the experience would be as follows:
* Add custom compressed formats to the registry, just after creating the
pipeline.
* Use any (applicable) IO without any special considerations. Compression
is handled automatically by the filename extension.
* Alternatively, override the compression format at any source / sink.

Does this make sense?

On Sun, May 22, 2016 at 3:01 AM, Jean-Baptiste Onofré 
wrote:

> Hi Frederick,
>
> thanks for the update. We gonna take a look.
>
> Thanks !
> Regards
> JB
>
>
> On 05/21/2016 08:21 PM, Frederick Kautz wrote:
>
>> I impemented a potential solution to "[BEAM-64] General decompression
>> registry". It still needs a bit more attention with some of the finer
>> details, e.g. better error handling, better javadocs, adding unit tests.
>>
>> However, before I spend more time on it, I would like a review of the
>> general design.
>>
>>
>> https://github.com/apache/incubator-beam/compare/master...fkautz:beam-64?expand=1
>>
>> Design:
>>
>> I attempted to implement an approach that would require no code changes to
>> the users. There is an SDK interface change, but it should be backwards
>> compatible with existing code.
>>
>> TextIO.withCompression() is now capable of receiving a generic compressor
>> operator which includes all of the enums from before (AUTO, UNCOMPRESSED,
>> GZIP, BZIP2) but now can also receive a user or library implemented
>> compressor.
>>
>> CompressionType also receives a new getRegistry() which allows the user to
>> customize the behavior of AUTO. It allows the user to add, replace or
>> remove registered compressors as necessary.
>>
>> Here's a short list of changes:
>>
>> * Create a new CompressorOperator, compatible with Java 8 lambda
>> * CompressionType enum now implements CompressorType
>> * withCompression now takes a CompressorOperator
>> * Compression wrappers implementations moved from in-line code to
>> CompressionType enum
>> * Compression registry created
>> * AUTO now supports compressors registered with the registry
>>
>> Can someone review the design and give me feedback? If the design looks
>> good, I'll move forward on implementing tests, better exception error
>> messages, and improve the javadocs.
>>
>> Thanks,
>> Frederick
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Fewer number of minor/trivial issues

2016-05-23 Thread Davor Bonaci
For the "older" JIRA issues that you might be interested in, I'd suggest
just to comment there. The original reporter or the component lead can
easily verify it is still valid and assign it to you.

With that, Yash, welcome! We look forward to your contributions to the
project.

On Mon, May 23, 2016 at 11:35 AM, Jean-Baptiste Onofré 
wrote:

> Hi Ken,
>
> "starter" is fine too. We just have to "document" in the contrib guide ;)
>
> Regards
> JB
>
>
> On 05/23/2016 08:24 PM, Kenneth Knowles wrote:
>
>> We do already have a couple of issues labeled as "starter" for just this
>> purpose. I don't care much about the actual name; there are different
>> words
>> people think of ("easy-win", "starter", "newbie", "low-hanging-fruit") so
>> probably it would be useful to have a good Jira search linked from the
>> contribution guide.
>>
>> Kenn
>>
>> On Mon, May 23, 2016 at 9:10 AM, Scott Wegner > >
>> wrote:
>>
>> I'm working on integrating FindBugs static analysis into our build, which
>>> has uncovered a long list of outstanding issues. (JIRA
>>> , pull request
>>> ). Once integrated,
>>> I'd
>>> like to triage the baseline issues which will be a great source of
>>> low-hanging-fruit bugs.
>>>
>>> On Sun, May 22, 2016 at 3:07 AM Yash Sharma  wrote:
>>>
>>> Great to know someone's on top of it already.
 We should encourage creating /marking minor tasks which would give new
 contributors opportunity to break the ice.

 Another place could be test cases for certain modules which will also
 provide knowledge of code flow.

 -regards
 On May 22, 2016 8:01 PM, "Jean-Baptiste Onofré" 
 wrote:

 Hi Yash,
>
> During ApacheCon, I discussed with Davor to create a Jira tag: "low
> hanging fruit" ;)
> I did such tag in other Apache projects to encourage contribution.
>
> I see some potential Jira on this kind, especially documentation and
> examples.
>
> Regards
> JB
>
> On 05/22/2016 11:56 AM, Yash Sharma wrote:
>
> Hi Experts,
>> I have just been checking the Beam issue tracker but could not find
>>
> lot
>>>
 of

> minor/trivial tasks. Also many of the existing minor tasks are quite
>>
> old.

> These tasks are very helpful for newbie contributions and it would be
>> great
>> to see some minor tasks (or a newbie label), and would serve as good
>> starting points for fresh contributors.
>>
>> Thoughts ?
>>
>> Best Regards,
>> Yash
>>
>>
>> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
>

>>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: [PROPOSAL] Beam FAQ

2016-05-20 Thread Davor Bonaci
We are missing a basic getting started guide along with the rest of user
documentation. I think we should work on this first.

FAQ is a great idea for things that aren't or cannot be covered by those
documents -- but, we cannot really start that before we have at least a
draft version of the previous.

Wiki hosting would be owned by Infra, if we choose to go down that path at
some point.

On Fri, May 20, 2016 at 1:34 AM, Jean-Baptiste Onofré 
wrote:

> Hi,
>
> good idea for the FAQ. Not sure for the wiki: it would prefer kind of
> governance and review using the website.
>
> Regards
> JB
>
>
> On 05/20/2016 09:24 AM, Ismaël Mejía wrote:
>
>> Hello,
>>
>> I have stumbled with some issues while trying to execute pipelines with
>> all
>> the
>> different runners and I was wondering if we need to create a Frequently
>> Asked
>> Questions (FAQ) section on the website. Maybe it would be better to create
>> such
>> thing as a wiki so we can contribute faster.
>>
>> What do you think ? And what way you think is the better to do so (infra)
>> ?
>>
>> Regards,
>> Ismaël
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: [DISCUSS] Developing new components -- branches, maturity, and committers

2016-05-19 Thread Davor Bonaci
If anybody wants to experiment a little with a feature idea -- absolutely,
individual forked repositories are certainly an awesome place for such
attempts.

However, for something that is a significant undertaking, like a new runner
or new SDK, I think feature branches in the main repository make total
sense. We'd avoid important disadvantages of lower visibility, harder for
others to jump in, comment, learn, etc., harder testing because Apache
Jenkins wouldn't be able to do it automatically, etc.

In summary, I think there's a spectrum of feature complexities and
longevity considerations. As such, I'd support being flexible as
appropriate, but have a default answer of starting with a feature branch in
the main repository for new major components.

On Thu, May 19, 2016 at 3:09 AM, Ismaël Mejía  wrote:

> I agree with Aljoscha, about not putting the feature branches in the main
> repo, however how can we make people  aware of the new developments ?
>
> -Ismaël
>
> On Thu, May 19, 2016 at 11:56 AM, Aljoscha Krettek 
> wrote:
>
> > +1
> >
> > When we say feature branch, are we talking about a branch in the main
> repo?
> > I would propose that feature branches live in the repos of the committers
> > who are working on a feature.
> >
> > On Thu, 19 May 2016 at 11:54 Jean-Baptiste Onofré 
> wrote:
> >
> > > +1
> > >
> > > it looks good to me.
> > >
> > > Regards
> > > JB
> > >
> > > On 05/19/2016 07:01 AM, Frances Perry wrote:
> > > > Hi Beamers --
> > > >
> > > > I’m thrilled by the recent energy and activity on writing new Beam
> > > runners!
> > > > But that also means it’s probably time for us to figure out how, as a
> > > > community, we want to support this process. ;-)
> > > >
> > > > Back near the beginning, we had a thread [1] discussing that feature
> > > > branches are the preferred way of doing development of features or
> > > > components that may take a while to reach maturity. I think new
> > > components
> > > > like runners and SDKs meet the bar to be started from a feature
> branch.
> > > > (Other features, like an IO connector or library of PTransforms,
> might
> > > also
> > > > qualify depending on complexity.)
> > > >
> > > > We should also lay out what it takes to be considered mature enough
> to
> > be
> > > > merged into master, since once that happens the component gets
> released
> > > to
> > > > users and failing tests become blocking issues. Here are some initial
> > > > thoughts to kick off the discussion...
> > > >
> > > > In order to be merged into master, new components / major features
> > > should:
> > > >
> > > > -
> > > >
> > > > have at least 2 contributors interested in maintaining it, and 1
> > > > committer interested in supporting it
> > > > -
> > > >
> > > > provide both end-user and developer-facing documentation
> > > > -
> > > >
> > > > have at least a basic level of unit test coverage
> > > > -
> > > >
> > > > run all existing applicable integration tests with other Beam
> > > components
> > > > and create additional tests as appropriate
> > > >
> > > >
> > > > In addition...
> > > >
> > > > A runner should:
> > > >
> > > > -
> > > >
> > > > be able to handle a subset of the model that address a
> significant
> > > set
> > > > of use cases (aka. ‘traditional batch’ or ‘processing time
> > > streaming’)
> > > > -
> > > >
> > > > update the capability matrix with the current status
> > > >
> > > >
> > > > An SDK* should:
> > > >
> > > > -
> > > >
> > > > provide the ability to construct graphs with all the basic
> building
> > > > blocks of the model (ParDo, GroupByKey, Window, Trigger, etc)
> > > > -
> > > >
> > > > begin fleshing out the common composite transforms (Count, Join,
> > etc)
> > > > and IO connectors (Text, Kafka, etc)
> > > > -
> > > >
> > > > have at least one runner that can execute the complete model (may
> > be
> > > a
> > > > direct runner)
> > > > -
> > > >
> > > > provide integration tests for executing against current and
> future
> > > > runners
> > > >
> > > >
> > > > * A note on DSLs:  I think it’s important to separate out an SDK
> from a
> > > > DSL, because in my mind the former is by definition equivalent to the
> > > Beam
> > > > model, while the latter may select portions of the model or change
> the
> > > > user-visible abstractions in order to provide a domain-specific
> > > experience.
> > > > We may want to encourage some DSLs to live separately from Beam
> because
> > > > they may look completely non-Beam-like to their end users. But we can
> > > > probably punt this decision until we have concrete examples to
> discuss.
> > > >
> > > > Another fun part of this growth is that we’ll likely grow new
> > committers.
> > > > And given the breadth of Beam, I think it would be useful to annotate
> > our
> > > > committers [2] page with which components folks are the most
> > > knowledgeable
> > > > about.
> > > >
> > > > Looking 

Re: Failing Jenkins Runs

2016-05-19 Thread Davor Bonaci
This is a wider problem, not specific to our project, tracked by
INFRA-11878 [1]. Nothing we can do right now.

[1] https://issues.apache.org/jira/browse/INFRA-11878

On Thu, May 19, 2016 at 2:21 AM, Aljoscha Krettek 
wrote:

> Hi,
> on all of the recent PRs Jenkins fails with this message:
> https://builds.apache.org/job/beam_PreCommit_MavenVerify/1213/console
>
> Does anyone have an idea what might be going on? Also, where is Jenkins
> configured? With this I could take a look myself.
>
> -Aljoscha
>


Re: Build failed in Jenkins: beam_Release_NightlySnapshot #43

2016-05-16 Thread Davor Bonaci
Build #42 and #43 failed due to a repository outage. No action needed.

On Mon, May 16, 2016 at 12:07 AM, Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> See 
>
> --
> [...truncated 91 lines...]
> [INFO]
> 
> [INFO]
> [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ sdks-parent ---
> [INFO] Deleting <
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/sdks/target>
> [INFO]
> [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce) @ sdks-parent ---
> [INFO]
> [INFO] --- maven-remote-resources-plugin:1.5:process (default) @
> sdks-parent ---
> [INFO]
> [INFO] --- maven-site-plugin:3.4:attach-descriptor (attach-descriptor) @
> sdks-parent ---
> [INFO]
> [INFO] --- maven-install-plugin:2.5.2:install (default-install) @
> sdks-parent ---
> [INFO] Installing <
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/sdks/pom.xml>
> to <
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/.repository/org/apache/beam/sdks-parent/0.1.0-incubating-SNAPSHOT/sdks-parent-0.1.0-incubating-SNAPSHOT.pom
> >
> [INFO]
> [INFO] --- maven-deploy-plugin:2.8.2:deploy (default-deploy) @ sdks-parent
> ---
> [INFO] Downloading:
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/sdks-parent/0.1.0-incubating-SNAPSHOT/maven-metadata.xml
> [INFO] Downloaded:
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/sdks-parent/0.1.0-incubating-SNAPSHOT/maven-metadata.xml
> (627 B at 0.1 KB/sec)
> [INFO] Uploading:
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/sdks-parent/0.1.0-incubating-SNAPSHOT/sdks-parent-0.1.0-incubating-20160516.070200-41.pom
> [INFO] Uploaded:
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/sdks-parent/0.1.0-incubating-SNAPSHOT/sdks-parent-0.1.0-incubating-20160516.070200-41.pom
> (2 KB at 0.0 KB/sec)
> [INFO] Downloading:
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/sdks-parent/maven-metadata.xml
> [INFO] Downloaded:
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/sdks-parent/maven-metadata.xml
> (297 B at 0.0 KB/sec)
> [INFO] Uploading:
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/sdks-parent/0.1.0-incubating-SNAPSHOT/maven-metadata.xml
> [INFO] Uploaded:
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/sdks-parent/0.1.0-incubating-SNAPSHOT/maven-metadata.xml
> (627 B at 0.2 KB/sec)
> [INFO] Uploading:
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/sdks-parent/maven-metadata.xml
> [INFO] Uploaded:
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/sdks-parent/maven-metadata.xml
> (297 B at 0.0 KB/sec)
> [INFO]
> [INFO]
> 
> [INFO] Building Apache Beam :: SDKs :: Java 0.1.0-incubating-SNAPSHOT
> [INFO]
> 
> [INFO]
> [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ java-sdk-parent
> ---
> [INFO] Deleting <
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/sdks/java/target
> >
> [INFO]
> [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce) @ java-sdk-parent
> ---
> [INFO]
> [INFO] --- maven-remote-resources-plugin:1.5:process (default) @
> java-sdk-parent ---
> [INFO]
> [INFO] --- maven-site-plugin:3.4:attach-descriptor (attach-descriptor) @
> java-sdk-parent ---
> [INFO]
> [INFO] --- maven-install-plugin:2.5.2:install (default-install) @
> java-sdk-parent ---
> [INFO] Installing <
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/sdks/java/pom.xml>
> to <
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/.repository/org/apache/beam/java-sdk-parent/0.1.0-incubating-SNAPSHOT/java-sdk-parent-0.1.0-incubating-SNAPSHOT.pom
> >
> [INFO]
> [INFO] --- maven-deploy-plugin:2.8.2:deploy (default-deploy) @
> java-sdk-parent ---
> [INFO] Downloading:
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/java-sdk-parent/0.1.0-incubating-SNAPSHOT/maven-metadata.xml
> [INFO] Downloaded:
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/java-sdk-parent/0.1.0-incubating-SNAPSHOT/maven-metadata.xml
> (631 B at 1.1 KB/sec)
> [INFO] Uploading:
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/java-sdk-parent/0.1.0-incubating-SNAPSHOT/java-sdk-parent-0.1.0-incubating-20160516.070335-41.pom
> [INFO] Uploaded:
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/java-sdk-parent/0.1.0-incubating-SNAPSHOT/java-sdk-parent-0.1.0-incubating-20160516.070335-41.pom
> (2 KB at 3.0 KB/sec)
> [INFO] Downloading:
> https:

Re: TypeDescriptors Example Code

2016-05-16 Thread Davor Bonaci
Sure -- go ahead. (I'd probably avoid static import, however. One word
more, but more readable.)

On Mon, May 16, 2016 at 9:05 AM, Jesse Anderson 
wrote:

> Does anyone have any thoughts or concerns with me changing the example code
> to use the new TypeDescriptors class from the inline creation of a
> TypeDescriptor?
>
> For example, MinimalWordCountJava8 would change from:
>
>
> p.apply(TextIO.Read.from("gs://dataflow-samples/shakespeare/*"))
>
> .apply(FlatMapElements.via((String word) ->
> Arrays.asList(word.split("[^a-zA-Z']+")))
>
> .withOutputType(new TypeDescriptor() {}))
>
> to:
>
> p.apply(TextIO.Read.from("gs://dataflow-samples/shakespeare/*"))
>
> .apply(FlatMapElements.via((String word) ->
> Arrays.asList(word.split("[^a-zA-Z']+")))
> .withOutputType(strings()))
>
> I'd use a static import to decrease the code footprint.
>
> Thanks,
>
> Jesse
>


Process / contribution guide

2016-05-08 Thread Davor Bonaci
Hi everyone,
I wanted to send a quick remainder that we should all try to follow our own
contribution guide.

Recently, there have been several cases where commits didn't go through the
pull requests / review, pull requests that were merge differently, not
closed automatically by tooling, etc.

I'd kindly ask to try your best to follow our own process. That said, we
now have more experience in this type of development -- if there's any
point that should be re-discussed, please bring it up for consideration.

Thanks!

Davor


Re: Towards Apache Beam 0.1-incubating

2016-05-08 Thread Davor Bonaci
Yes -- this first release is coming!

As we discussed on the virtual meeting last week, we'd like to build the
repeatable and auditable release process for everyone. Along with testing
infrastructure, this should result in our ability to release quickly,
reliably, and frequently. This is currently underway; hopefully we can have
a very basic version before the first release, which we can then improve
over time.

Additionally, we were discussing a "release checklist"-type of approach on
a per-module basis. Again, something that would benefit us early on.

Thus, quite a few things to do over the next week or so. More information
coming next week!

On Sun, May 8, 2016 at 11:40 AM, Jean-Baptiste Onofré 
wrote:

> Hi beamers,
>
> as discussed previously, we are planning first 0.1-incubating release next
> week.
>
> I started to review the repo, and I will push some changes in preparation
> for the release (especially the NOTICE file, some checkstyle fixes, etc).
>
> If you have some pending PRs that you want to include in this first
> release, please let me know.
>
> It's totally possible to postpone the release to the week after if
> required: no pressure ;)
>
> Thanks,
> Regards
> JB
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: ApacheCon next week

2016-05-08 Thread Davor Bonaci
I'll be there Monday through either Wednesday or Thursday.

On Sun, May 8, 2016 at 1:53 AM, Jean-Baptiste Onofré 
wrote:

> Hi Martin,
>
> yes, I will be at ApacheBigData & ApacheCon Tuesday, Wednesday, Thursday.
>
> I will land Monday evening and fly back to France on Thrusday night.
>
> Regards
> JB
>
>
> On 05/08/2016 09:59 AM, Martin Suchanek wrote:
>
>> Hi Davor, JB,
>>
>> Will you be at the "Apache: Big Data" (
>> http://events.linuxfoundation.org/events/apache-big-data-north-america )
>> conference as well, Monday through Wednesday?
>>
>> On Sat, May 7, 2016 at 10:37 PM Jean-Baptiste Onofré 
>> wrote:
>>
>> Hi Davor,
>>>
>>> I will be there ;)
>>>
>>> I added a Beam session in the podlings shark tank:
>>>
>>> https://wiki.apache.org/apachecon/ACEU16PodlingSharkTank
>>>
>>>
>>>
>>> http://apachecon2016.sched.org/event/6OIN/podlings-shark-tank-roman-shaposhnik-pivotal?iframe=no&w=i:100;&sidebar=yes&bg=no
>>>
>>> @Davor, I count on you for this session with me ;)
>>>
>>> Regards
>>> JB
>>>
>>> On 05/07/2016 10:18 PM, Davor Bonaci wrote:
>>>
>>>> Hi everyone,
>>>> I wanted to check if any of the current or prospective Beam contributors
>>>> will be attending ApacheCon next week in Vancouver.
>>>>
>>>> If you'll be there and would like to talk about all-things-Beam, please
>>>> reach out!
>>>>
>>>> JB, the two of us will talk, of course. See you there ;)
>>>>
>>>> Thanks,
>>>> Davor
>>>>
>>>>
>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Build failed in Jenkins: beam_Release_NightlySnapshot #34

2016-05-08 Thread Davor Bonaci
Retrying, as it seems like an external issue.

[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-deploy-plugin:2.8.2:deploy (default-deploy)
on project direct-runner: Failed to deploy artifacts: Could not transfer
artifact
org.apache.beam:direct-runner:jar:tests:0.1.0-incubating-20160508.073321-9
from/to apache.snapshots.https (
https://repository.apache.org/content/repositories/snapshots): Failed to
transfer file:
https://repository.apache.org/content/repositories/snapshots/org/apache/beam/direct-runner/0.1.0-incubating-SNAPSHOT/direct-runner-0.1.0-incubating-20160508.073321-9-tests.jar.
Return code is: 502, ReasonPhrase: Proxy Error. -> [Help 1]

On Sun, May 8, 2016 at 12:35 AM, Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> See  >
>
> Changes:
>
> [jbonofre] [BEAM-267] Enable checkstyle in Spark runner
>
> --
> [...truncated 5149 lines...]
> [WARNING] grpc-all-0.12.0.jar, grpc-auth-0.12.0.jar define 2 overlapping
> classes:
> [WARNING]   - io.grpc.auth.ClientAuthInterceptor$1
> [WARNING]   - io.grpc.auth.ClientAuthInterceptor
> [WARNING] runners-core-0.1.0-incubating-SNAPSHOT.jar,
> java-sdk-all-0.1.0-incubating-SNAPSHOT.jar define 1717 overlapping classes:
> [WARNING]   -
> org.apache.beam.sdk.repackaged.com.google.common.collect.TreeRangeSet$ComplementRangesByLowerBound$2
> [WARNING]   -
> org.apache.beam.sdk.repackaged.com.google.common.collect.WellBehavedMap$EntrySet$1$1
> [WARNING]   -
> org.apache.beam.sdk.repackaged.com.google.common.util.concurrent.CycleDetectingLockFactory$Policies$1
> [WARNING]   -
> org.apache.beam.sdk.repackaged.com.google.common.collect.Maps$6
> [WARNING]   -
> org.apache.beam.sdk.repackaged.com.google.common.primitives.UnsignedBytes$LexicographicalComparatorHolder$UnsafeComparator$1
> [WARNING]   -
> org.apache.beam.sdk.repackaged.com.google.common.collect.Collections2$OrderedPermutationCollection
> [WARNING]   -
> org.apache.beam.sdk.repackaged.com.google.common.collect.Range$1
> [WARNING]   -
> org.apache.beam.sdk.repackaged.com.google.common.base.Splitter$2
> [WARNING]   -
> org.apache.beam.sdk.repackaged.com.google.common.base.Equivalence$Identity
> [WARNING]   -
> org.apache.beam.sdk.repackaged.com.google.common.collect.Lists$1
> [WARNING]   - 1707 more...
> [WARNING] maven-shade-plugin has detected that some class files are
> [WARNING] present in two or more JARs. When this happens, only one
> [WARNING] single version of the class is copied to the uber jar.
> [WARNING] Usually this is not harmful and you can skip these warnings,
> [WARNING] otherwise try to manually exclude artifacts based on
> [WARNING] mvn dependency:tree -Ddetail=true and the above output.
> [WARNING] See http://docs.codehaus.org/display/MAVENUSER/Shade+Plugin
> [INFO] Replacing <
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/runners/direct-java/target/direct-runner-bundled-0.1.0-incubating-SNAPSHOT.jar>
> with <
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/runners/direct-java/target/direct-runner-0.1.0-incubating-SNAPSHOT-shaded.jar
> >
> [INFO]
> [INFO] --- maven-surefire-plugin:2.18.1:test (runnable-on-service-tests) @
> direct-runner ---
> [INFO] Tests are skipped.
> [INFO]
> [INFO] --- maven-dependency-plugin:2.10:analyze-only (default) @
> direct-runner ---
> [INFO] No dependency problems found
> [INFO]
> [INFO] --- maven-checkstyle-plugin:2.17:check (default) @ direct-runner ---
> [INFO] Starting audit...
> Audit done.
> [INFO]
> [INFO] --- maven-install-plugin:2.5.2:install (default-install) @
> direct-runner ---
> [INFO] Installing <
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/runners/direct-java/target/direct-runner-0.1.0-incubating-SNAPSHOT.jar>
> to <
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/.repository/org/apache/beam/direct-runner/0.1.0-incubating-SNAPSHOT/direct-runner-0.1.0-incubating-SNAPSHOT.jar
> >
> [INFO] Installing <
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/runners/direct-java/dependency-reduced-pom.xml>
> to <
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/.repository/org/apache/beam/direct-runner/0.1.0-incubating-SNAPSHOT/direct-runner-0.1.0-incubating-SNAPSHOT.pom
> >
> [INFO] Installing <
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/runners/direct-java/target/direct-runner-0.1.0-incubating-SNAPSHOT-tests.jar>
> to <
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/.repository/org/apache/beam/direct-runner/0.1.0-incubating-SNAPSHOT/direct-runner-0.1.0-incubating-SNAPSHOT-tests.jar
> >
> [INFO] Installing <
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/runners/direct-java/target/direct-runner-0.1.0-incubating-SNAPSHOT-sources.jar>
> to <
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/.repository/org/apache/beam/direct-runner/0.1.0-incubating-SNAPSHOT/di

ApacheCon next week

2016-05-07 Thread Davor Bonaci
Hi everyone,
I wanted to check if any of the current or prospective Beam contributors
will be attending ApacheCon next week in Vancouver.

If you'll be there and would like to talk about all-things-Beam, please
reach out!

JB, the two of us will talk, of course. See you there ;)

Thanks,
Davor


Testing resources are down

2016-05-02 Thread Davor Bonaci
Our Jenkins testing resources are down, following an outage of
builds.apache.org.

Jason is investigating.


Re: Build failed in Jenkins: beam_Release_NightlySnapshot #24

2016-04-29 Thread Davor Bonaci
Known breakage from yesterday. Restarted.

On Fri, Apr 29, 2016 at 12:11 AM, Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> See  >
>
> Changes:
>
> [tgroh] Add CommittedResult
>
> [tgroh] Stop cloning coders in the InProcessRunner
>
> [swegner] Consolidate checkstyle configuration in new 'build-tools' module
>
> --
> [...truncated 850 lines...]
> [INFO] Including io.netty:netty-common:jar:4.1.0.Beta8 in the shaded jar.
> [INFO] Including io.netty:netty-transport:jar:4.1.0.Beta8 in the shaded
> jar.
> [INFO] Including io.netty:netty-resolver:jar:4.1.0.Beta8 in the shaded jar.
> [INFO] Including io.netty:netty-codec:jar:4.1.0.Beta8 in the shaded jar.
> [INFO] Including com.google.api.grpc:grpc-pubsub-v1:jar:0.0.2 in the
> shaded jar.
> [INFO] Including com.google.api.grpc:grpc-core-proto:jar:0.0.3 in the
> shaded jar.
> [INFO] Including com.google.api-client:google-api-client:jar:1.21.0 in the
> shaded jar.
> [INFO] Including
> com.google.apis:google-api-services-bigquery:jar:v2-rev248-1.21.0 in the
> shaded jar.
> [INFO] Including
> com.google.apis:google-api-services-pubsub:jar:v1-rev7-1.21.0 in the shaded
> jar.
> [INFO] Including
> com.google.apis:google-api-services-storage:jar:v1-rev53-1.21.0 in the
> shaded jar.
> [INFO] Including com.google.http-client:google-http-client:jar:1.21.0 in
> the shaded jar.
> [INFO] Including org.apache.httpcomponents:httpclient:jar:4.0.1 in the
> shaded jar.
> [INFO] Including org.apache.httpcomponents:httpcore:jar:4.0.1 in the
> shaded jar.
> [INFO] Including commons-logging:commons-logging:jar:1.1.1 in the shaded
> jar.
> [INFO] Including commons-codec:commons-codec:jar:1.3 in the shaded jar.
> [INFO] Including
> com.google.http-client:google-http-client-jackson:jar:1.21.0 in the shaded
> jar.
> [INFO] Including
> com.google.http-client:google-http-client-jackson2:jar:1.21.0 in the shaded
> jar.
> [INFO] Including
> com.google.http-client:google-http-client-protobuf:jar:1.21.0 in the shaded
> jar.
> [INFO] Including
> com.google.oauth-client:google-oauth-client-java6:jar:1.21.0 in the shaded
> jar.
> [INFO] Including com.google.oauth-client:google-oauth-client:jar:1.21.0 in
> the shaded jar.
> [INFO] Including
> com.google.apis:google-api-services-datastore-protobuf:jar:v1beta2-rev1-4.0.0
> in the shaded jar.
> [INFO] Including com.google.cloud.bigdataoss:gcsio:jar:1.4.3 in the shaded
> jar.
> [INFO] Including com.google.api-client:google-api-client-java6:jar:1.20.0
> in the shaded jar.
> [INFO] Including
> com.google.api-client:google-api-client-jackson2:jar:1.20.0 in the shaded
> jar.
> [INFO] Including com.google.cloud.bigdataoss:util:jar:1.4.3 in the shaded
> jar.
> [INFO] Excluding com.google.guava:guava:jar:19.0 from the shaded jar.
> [INFO] Including com.google.protobuf:protobuf-java:jar:3.0.0-beta-1 in the
> shaded jar.
> [INFO] Including com.google.code.findbugs:jsr305:jar:3.0.1 in the shaded
> jar.
> [INFO] Including com.fasterxml.jackson.core:jackson-core:jar:2.7.0 in the
> shaded jar.
> [INFO] Including com.fasterxml.jackson.core:jackson-annotations:jar:2.7.0
> in the shaded jar.
> [INFO] Including com.fasterxml.jackson.core:jackson-databind:jar:2.7.0 in
> the shaded jar.
> [INFO] Including org.slf4j:slf4j-api:jar:1.7.14 in the shaded jar.
> [INFO] Including org.apache.avro:avro:jar:1.7.7 in the shaded jar.
> [INFO] Including org.codehaus.jackson:jackson-core-asl:jar:1.9.13 in the
> shaded jar.
> [INFO] Including org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13 in the
> shaded jar.
> [INFO] Including com.thoughtworks.paranamer:paranamer:jar:2.3 in the
> shaded jar.
> [INFO] Including org.xerial.snappy:snappy-java:jar:1.1.2.1 in the shaded
> jar.
> [INFO] Including org.apache.commons:commons-compress:jar:1.9 in the shaded
> jar.
> [INFO] Including joda-time:joda-time:jar:2.4 in the shaded jar.
> [INFO] Including org.codehaus.woodstox:stax2-api:jar:3.1.4 in the shaded
> jar.
> [INFO] Including org.codehaus.woodstox:woodstox-core-asl:jar:4.4.1 in the
> shaded jar.
> [INFO] Including org.tukaani:xz:jar:1.5 in the shaded jar.
> [INFO] Including com.google.auto.service:auto-service:jar:1.0-rc2 in the
> shaded jar.
> [INFO] Including com.google.auto:auto-common:jar:0.3 in the shaded jar.
> [WARNING] grpc-all-0.12.0.jar, grpc-protobuf-0.12.0.jar define 4
> overlapping classes:
> [WARNING]   - io.grpc.protobuf.ProtoUtils$2
> [WARNING]   - io.grpc.protobuf.ProtoUtils
> [WARNING]   - io.grpc.protobuf.ProtoUtils$1
> [WARNING]   - io.grpc.protobuf.ProtoInputStream
> [WARNING] grpc-protobuf-nano-0.12.0.jar, grpc-all-0.12.0.jar define 4
> overlapping classes:
> [WARNING]   - io.grpc.protobuf.nano.NanoProtoInputStream
> [WARNING]   - io.grpc.protobuf.nano.NanoUtils
> [WARNING]   - io.grpc.protobuf.nano.NanoUtils$1
> [WARNING]   - io.grpc.protobuf.nano.MessageNanoFactory
> [WARNING] grpc-all-0.12.0.jar, grpc-core-0.12.0.jar define 248 overlapping
>

IO timelines (Was: How to read/write avro data using FlinkKafka Consumer/Producer)

2016-04-28 Thread Davor Bonaci
[ Moving over to the dev@ list ]

I think we should be aiming a little higher than "trying out Beam" ;)

Beam SDK currently has built-in IOs for Kafka, as well as for all important
Google Cloud Platform services. Additionally, there are pull requests for
Firebase and Cassandra. This is not bad, particularly talking into account
that we have APIs for user to develop their own IO connectors. Of course,
there's a long way to go, but there should *not* be any users that are
blocked or scenarios that are impossible.

In terms of the runner support, Cloud Dataflow runner supports all IOs,
including any user-written ones. Other runners don't as extensively, but
this is a high priority item to address.

In my mind, we should strive to address the following:

   - Complete conversion of existing IOs to the Source / Sink API. ETA: a
   week or two for full completion.
   - Make sure Spark & Flink runners fully support Source / Sink API, and
   that ties into the new Runner / Fn API discussion.
   - Increase the set of built-in IOs. No ETA; iterative process over time.
   There are 2 pending pull requests, others in development.

I'm hopeful we can address all of these items in a relatively short period
of time -- in a few months or so -- and likely before we can call any
release "stable". (This is why the new Runner / Fn API discussions are so
important.)

In summary, in my mind, "long run" here means "< few months".

-- Forwarded message --
From: Maximilian Michels 
Date: Thu, Apr 28, 2016 at 3:20 AM
Subject: Re: How to read/write avro data using FlinkKafka Consumer/Producer
(Beam Pipeline) ?
To: u...@beam.incubator.apache.org

On Wed, Apr 27, 2016 at 11:12 PM, Jean-Baptiste Onofré 
wrote:
> generally speaking, we have to check that all runners work fine with the
provided IO. I don't think it's a good idea that the runners themselves
implement any IO: they should use "out of the box" IO.

In the long run, big yes and I liked to help to make it possible!
However, there is still a gap between what Beam and its Runners
provide and what users want to do. For the time being, I think the
solution we have is fine. It gives users the option to try out Beam
with sources and sinks that they expect to be available in streaming
systems.


Re: [DISCUSS] DSL and type converter

2016-04-28 Thread Davor Bonaci
Typing is often hard ;)

This sounds like a DSL-specific design decision. Perhaps we could start by
specifying what the objectives and capabilities of this particular DSL
would be. I think we would then be able to comment on the advantages and
disadvantages of various choices. Otherwise, it is hard to assess how a
particular choice would impact the end goal.

On Thu, Apr 28, 2016 at 5:39 AM, Jean-Baptiste Onofré 
wrote:

> Hi all,
>
> I started to sketch a couple of declarative DSLs (XML and JSON) on top of
> the SDK (I created a new dsl module on my local git).
>
> When using the SDK, the user "controls" and knows the type of the data.
>
> For instance, if the pipeline starts with a Kafka source, the user knows
> that he will have a PCollection of KafkaRecords (it can eventually use a
> coder).
>
> Imagine we have a DSL like this (just an example):
>
> 
>   
>   
> 
>
> The KafkaRecord collection from the Kafka source has to be "converted"
> into a collection of String for instance.
>
> In the DSL, I think it makes sense to do it "implicitly". If I compare
> with what we are doing in Apache Camel, the DSL could have a DataExchange
> context where we can store a set of TypeConverters. It's basically a Map to
> convert from one type (KafkaRecord) to another type (String). It means that
> the IO have to define the expected type (provided for source, consumed for
> sink).
>
> Generally speaking, we can image to use Avro to convert any type (mostly).
>
> Thoughts ?
>
> Thanks,
> Regards
> JB
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: [DISCUSS] Beam IO &runners native IO

2016-04-28 Thread Davor Bonaci
Generally speaking, the SDKs define all user APIs, including all IOs. We
should strive that users never use any runner-specific APIs directly. As
such, there should be no runner-provided IOs visible to user. ( Of course,
some exceptions will have to apply, such as runner-specific configuration
via PipelineOptions, etc. )

All SDK-provided IO should be written in terms of Source / Sink API. All
runners should support running pipelines that use this APIs. In that world,
all IOs would run on all runners. However, neither of this is true
currently:

   - We used to have all sources and sink implemented differently in a
   runner-native way. Over the last month, we have converted TextIO, AvroIO,
   and BigQueryIO.Write to follow this design. (Thanks Luke and Pei!)
   - BigQueryIO.Read and PubsubIO are the last pieces left in the old
   design, and there are pending PRs to address those. (Thanks Pei and Mark!)
   - Neither Flink or Spark fully support Source / Sink API, AFAIK. There
   are outstanding JIRA issues to address those -- those are high priority,
   IMO.

At execution time, any runner is free to replace the SDK-provided IO with a
runner-native one, as appropriate. For example, a runner may have a faster
implementation than the SDK-provided one. That choice should be
transparent, and made by the runner, not the user.

(Aside, this is why the runner API is so important -- that runners have
enough information to make the right choice on behalf of the user, without
needing to delegate implementation details to users -- no user knobs!)

This is the current design, which we believe addresses all scenarios we
care about:

   - All IOs run on all runners.
   - Any runner can provide a better or faster runner-native implementation
   of any IO.
   - Users are abstracted away from all implementation details.
   - All pipelines are runner-portable, because users don't use any
   runner-specific APIs directly.


On Thu, Apr 28, 2016 at 8:33 AM, Amit Sela  wrote:

> From the Spark runner point of view, the implementation of the KafkaIO (for
> example) is to define the "settings" required to read from Kafka and from a
> quick look at the SDK's kafkaIO, it looks like it could be used instead of
> the runner's implementation (and if not now, then probably once Spark
> supports Kafka 0.9 connector API).
>
> As for the bigger picture here, as far as I can see, IOs *translation *will
> always be runner-specific because they either create whatever PCollections
> represent from external source output from whatever PCollections represent.
> So I think translation will always be runner-specific for IOs.
>
> Back to the IOs themselves, the SDK should allow the runner to extend it's
> implementation of the IO if and where needed, so if the KafkaIO is missing
> Encoder/Decoder kafka serializer settings, it could just add those.
>
> Does this make sense ?
>
>
> On Thu, Apr 28, 2016 at 3:45 PM Jean-Baptiste Onofré 
> wrote:
>
> > Hi all,
> >
> > regarding the recent threads on the mailing list, I would like to start
> > a format discussion around the IO.
> > As we can expect the first contributions on this area (I already have
> > some work in progress around this ;)), I think it's a fair discussion to
> > have.
> >
> > Now, we have two kinds of IO: the one "generic" to Beam, the one "local"
> > to the runners.
> >
> > For example, let's take Kafka: we have the KafkaIO (in IO), and for
> > instance, we have the spark-streaming kafka connector (in Spark Runner).
> >
> > Right now, we have two approaches for the user:
> > 1. In the pipeline, we use KafkaIO from Beam: it's the preferred
> > approach for sure. However, the user may want to use the runner specific
> > IO for two reasons:
> > * Beam doesn't provide the IO yet (for instance, spark cassandra
> > connector is available whereas we don't have yet any CassandraIO (I'm
> > working on it anyway ;))
> > * The runner native IO is optimized or contain more features that
> > the
> > Beam native IO
> > 2. So, for the previous reasons, the user could want to use the native
> > runner IO. The drawback of this approach is that the pipeline will be
> > tight to a specific runner, which is completely against the Beam design.
> >
> > I wonder if it wouldn't make sense to add flag on the IO API (and
> > related on Runner API) like .useNative().
> >
> > For instance, the user would be able to do:
> >
> >
> >
> pipeline.apply(KafkaIO.read().withBootstrapServers("...").withTopics("...").useNative(true);
> >
> > then, if the runner has a "native" IO, it will use it, else, if
> > useNative(false) (the default), it won't use any runner native IO.
> >
> > The point there is for the configuration: assuming the Beam IO and the
> > runner IO can differ, it means that the "Beam IO" would have to populate
> > all runner specific IO configuration.
> >
> > Of course, it's always possible to use a PTransform to wrap the runner
> > native IO, but we are back on the same concern: the pipe

Re: Podling report for May 16

2016-04-25 Thread Davor Bonaci
I'd keep this in the draft state, since May 16th is several weeks away.

There's obviously a lot of progress already, but we might have a ton more
by then. For example, the first release might happen, among other things.

On Mon, Apr 25, 2016 at 1:15 PM, Jean-Baptiste Onofré 
wrote:

> Hi all,
>
> As I asked last month to the IPMC, I prepared the podling report for May
> 16:
>
> https://wiki.apache.org/incubator/May2016
>
> Please, review it and let me know if I forgot something.
>
> Thanks !
> Regards
> JB
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Will Beam provide a machine learning API in the future?

2016-04-20 Thread Davor Bonaci
It seems like there's a lot of community interest in ML running on Beam --
definitely something that we should eventually have in Beam.

Hopefully, we'll be able to coordinate individual efforts to come up with a
unified API. It fits right in with Beam goals to have a library of ML
PTransforms that isn't tied to any particular ML backend. Then, users will
have portability benefits and will be able to make the right choice for
them for each execution.

Overall, I think this is a complex feature with a really big impact and
benefit to Beam. As such, it would be great to write up and discuss
architecture and design in detail first.

--

In terms of specific questions, a library of PTransforms would probably be
a better start than a DSL (but that doesn't exclude the possibility of a
DSL some day). There would be a default implementation, and then each
runner could override it, as appropriate.

I think Simone's warning should be taken into account, however. Definitely
something to have in mind as the design progresses.


Re: add use cases to capability matrix

2016-04-20 Thread Davor Bonaci
I would like to avoid complicating the capability matrix itself with such
details. Hopefully, user documentation for each of these features would
(eventually) give insights what you could use them for, and we could
cross-link to that. For now, you can refer to the Dataflow SDK
documentation to get some of this information [1]. (We'll have that ported
over to Beam soon.)

The answer your specific question about priority, you should probably
prioritize "what" over "where" over "when" over "how" parts. That said, it
is probably fine to advance to the next category once you have figured out
the first few bullets in the current category.

[1] https://cloud.google.com/dataflow/model/programming-model

On Wed, Apr 20, 2016 at 2:11 AM, Jean-Baptiste Onofré 
wrote:

> Hi Manu,
>
> generally speaking, we have to add a complete started guide with "real"
> use cases to illustrate beam usage.
>
> I'm preparing some website PR about this (with the overview of IOs,
> DSLs/SDKs, runners, etc we discussed early).
>
> Regards
> JB
>
>
> On 04/20/2016 10:22 AM, Manu Zhang wrote:
>
>> Guys,
>>
>> Do you think it's valuable to add real world use cases to capability
>> matrix
>>  ?
>> Then, we could know why a particular capability is needed and which should
>> be prioritized for runner implementations.
>> I found some examples in the Dataflow paper (3.3)
>> <
>> http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43864.pdf
>> >
>> and another reference is
>> http://www.vldb.org/pvldb/vol8/p2040-Kejariwal.pdf.
>>
>> Thanks,
>> Manu Zhang
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: [MENTOR] Google docs (was: Will Beam provide a machine learning API in the future?)

2016-04-20 Thread Davor Bonaci
It is definitely the case that all documents are open. I think most of us
would like to continue sharing draft documents this way.

We have planned to move longer-lived documents to an Apache repository all
along -- lots to do on the website part, however ;)

On Wed, Apr 20, 2016 at 4:32 AM, Bertrand Delacretaz  wrote:

> Hi JB,
>
> On Wed, Apr 20, 2016 at 1:21 PM, Jean-Baptiste Onofré 
> wrote:
> > ...When the documents are likely "finalized", then, they will become
> part of
> > the website repo...
>
> Ok. That's somewhat fine if every community member can contribute to
> them (which IIUC is the case), though my preference goes to having
> such documents in an ASF hosted repository.
>
> Markdown works fine IMO for collaboration if you use it from the
> beginning, but moving from a different format is probably less
> efficient.
>
> -Bertrand
>


Re: [DISCUSS] Adding Some Sort of SideInputRunner

2016-04-20 Thread Davor Bonaci
If we come up with a general approach in the context of the Flink runner,
perhaps that piece can go back to the "runner-core" component and be
adopted more widely.

On Wed, Apr 20, 2016 at 8:13 AM, Kenneth Knowles 
wrote:

> Hi Aljoscha,
>
> Great idea.
>
>  - The logic for matching up the windows is WindowFn#getSideInputWindow [1]
>  - The SDK used to have something along the lines of what you describe [2]
> but we thought it was too runner-specific, directly referencing Dataflow
> details, and with a particular model of buffering + timer. Perhaps it is a
> starting place for your design?
>
> Kenn
>
> [1]
>
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/WindowFn.java#L131
>
> [2]
>
> https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/1e5524a7f5d0d774488cb0206ea6433085461775/sdk/src/main/java/com/google/cloud/dataflow/sdk/runners/worker/StreamingSideInputDoFnRunner.java
>
> On Wed, Apr 20, 2016 at 4:25 AM, Jean-Baptiste Onofré 
> wrote:
>
> > Hi Aljoscha
> >
> > AFAIR, the Runner API Proposal document (from Kenneth) contains some
> > points about side input.
> >
> >
> >
> https://drive.google.com/folderview?id=0B-IhJZh9Ab52OFBVZHpsNjc4eXc&usp=sharing
> >
> > I don't think it goes into the details of side inputs and windows, but
> > definitely the document we should extend.
> >
> > Regards
> > JB
> >
> >
> >
> > On 04/20/2016 11:55 AM, Aljoscha Krettek wrote:
> >
> >> Hi,
> >> for https://issues.apache.org/jira/browse/BEAM-102 we will need to have
> >> some functionality that deals with side inputs and windows (of both the
> >> main input and the side inputs) and how they get matched and how we wait
> >> for windows (blocking). I imagine that we could add some component that
> is
> >> similar to ReduceFnRunner but for side inputs: We would just instantiate
> >> it
> >> with a factory for state storage, then push elements into it while
> >> processing and it would provide a way to get a SideInputReader.
> >>
> >> I think this would not be specific to the Flink runner because other
> >> runner
> >> implementors will face similar problems. Are there any ideas/design docs
> >> about such a thing already? If not, we should probably start designing.
> >>
> >> What do you think?
> >>
> >> Cheers,
> >> Aljoscha
> >>
> >>
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>


Re: Massive package renaming coming

2016-04-20 Thread Davor Bonaci
The Google Cloud Dataflow runner has now been recovered. Special thanks
goes to Luke Cwik for working through several issues to get us here.

Jenkins project "beam_PostCommit_RunnableOnService_GoogleCloudDataflow" now
verifies this integration. It runs a test pass for every code push to the
repository, with failure emails being sent to the commits@ mailing list.
>From this point onward, let's make sure we address any breakage to this
test as soon as possible.

( Hopefully, we'll get this running in a pre-commit way too. That is,
unfortunately, blocked on INFRA-11610 [2]. )

[1]
https://builds.apache.org/view/Beam/job/beam_PostCommit_RunnableOnService_GoogleCloudDataflow/
[2] https://issues.apache.org/jira/browse/INFRA-11610

On Wed, Apr 13, 2016 at 10:55 PM, Jean-Baptiste Onofré 
wrote:

> Great work guys !
>
> A big step forward.
>
> Thanks
> Regards
> JB
>
>
> On 04/14/2016 07:43 AM, Davor Bonaci wrote:
>
>> I've just merged a pull request that includes this project-wide renaming
>> of
>> Java packages. (Long live Beam!)
>>
>> At this point, pending pull requests may need to be rebased. We'll try to
>> fix the Cloud Dataflow runner as soon as possible, but it might take a
>> little bit to complete that.
>>
>> There's still a lot left to re-organize in Beam, but I'd expect future
>> changes not to be this far-reaching.
>>
>> A special thanks goes to Ben Chambers for pulling off several tricks to
>> get
>> this done quickly and effectively.
>>
>> On Tue, Apr 12, 2016 at 9:38 PM, Jean-Baptiste Onofré 
>> wrote:
>>
>> Hi Davor,
>>>
>>> +1 !
>>>
>>> I already updated the PullRequest for annotations package.
>>>
>>> Thanks !
>>> Regards
>>> JB
>>>
>>>
>>> On 04/13/2016 03:23 AM, Davor Bonaci wrote:
>>>
>>> We are preparing to do a massive, project-wide package rename from
>>>> "com.google.cloud.dataflow" to "org.apache.beam". At the earliest, this
>>>> could occur sometime tomorrow afternoon (Pacific time).
>>>>
>>>> Unfortunately, there's no way to do this without affecting ongoing work.
>>>> We'll try to do it as quickly as possible to minimize such impact.
>>>>
>>>> We'll ensure that existing automated testing passes before merging the
>>>> change, with the exception of integration coverage with the Google Cloud
>>>> Dataflow service. We expect that the code in Beam's master will not work
>>>> against Cloud Dataflow for a little bit -- we'll accept this breakage
>>>> on a
>>>> one-time basis, and try to recover it as soon as possible thereafter.
>>>>
>>>> If anybody sees any issues with this plan, I'd love to hear it.
>>>>
>>>> This is one of those mandatory things we've been delaying for a while
>>>> now.
>>>> Of course, there are more such things to come, but hopefully none that
>>>> are
>>>> this wide.
>>>>
>>>> Thanks!
>>>>
>>>>
>>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Issues Mailing List

2016-04-20 Thread Davor Bonaci
I believe all JIRA-related mails to commits@ will have "[jira]" in the
subject line, so it is usually easy to set up proper filtering.

On Wed, Apr 20, 2016 at 4:18 AM, Jean-Baptiste Onofré 
wrote:

> Hi Aljoscha,
>
> Yes, we use the comm...@beam.incubator.apache.org mailing list for that.
>
> It's what we do in most of Apache projects (no need to create an
> additional mailing list for that).
>
> Regards
> JB
>
>
> On 04/20/2016 11:18 AM, Aljoscha Krettek wrote:
>
>> Hi,
>> is there something like iss...@beam.inbubator.apache.org where I could
>> follow "issue created" and other updates? I didn't find it in our list of
>> mailing lists.
>>
>> If not, then maybe we should create it. At least I find it very helpful to
>> know about new issues without having to "poll" the issues list manually in
>> Jira.
>>
>> Cheers,
>> Aljoscha
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Build failed in Jenkins: beam_Release_NightlySnapshot #12

2016-04-19 Thread Davor Bonaci
Nightly build failed for what seems like a network connectivity issue.
Restarted as #13 [1].

[1] https://builds.apache.org/job/beam_Release_NightlySnapshot/13/

On Tue, Apr 19, 2016 at 12:02 AM, Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> See  >
>
> Changes:
>
> [mxm] [flink] improve InputFormat wrapper and ReadSourceITCase
>
> [mxm] [flink] improvements to UnboundedSource translation
>
> [mxm] [BEAM-158] add support for bounded sources in streaming
>
> [lcwik] [BEAM-202] Clean-up *CoderBase classes since we are on a newer
> version
>
> [dhalperi] [BEAM-50] BigQueryIO: move write validation to validate() from
> apply()
>
> [dhalperi] [BEAM-50] BigQueryIO: fix autocompleting project and test it
>
> --
> Started by timer
> [EnvInject] - Loading node environment variables.
> Building remotely on beam2 (beam) in workspace <
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/>
>  > git rev-parse --is-inside-work-tree # timeout=10
> Fetching changes from the remote Git repository
>  > git config remote.origin.url
> https://github.com/apache/incubator-beam.git # timeout=10
> Fetching upstream changes from
> https://github.com/apache/incubator-beam.git
>  > git --version # timeout=10
>  > git -c core.askpass=true fetch --tags --progress
> https://github.com/apache/incubator-beam.git
> +refs/heads/*:refs/remotes/origin/*
>  > git rev-parse refs/remotes/origin/master^{commit} # timeout=10
>  > git rev-parse refs/remotes/origin/origin/master^{commit} # timeout=10
> Checking out Revision bf78e966716c2630573c8ea135af9807053253b5
> (refs/remotes/origin/master)
>  > git config core.sparsecheckout # timeout=10
>  > git checkout -f bf78e966716c2630573c8ea135af9807053253b5
>  > git rev-list 7646384e2c2c45a384dfde6bb1ba20014ff4f733 # timeout=10
> [EnvInject] - Executing scripts and injecting environment variables after
> the SCM step.
> [EnvInject] - Injecting as environment variables the properties content
> SPARK_LOCAL_IP=127.0.0.1
>
> [EnvInject] - Variables injected successfully.
> Parsing POMs
> Established TCP socket on 54058
> maven32-agent.jar already up to date
> maven32-interceptor.jar already up to date
> maven3-interceptor-commons.jar already up to date
> [beam_Release_NightlySnapshot] $
> /home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_66/bin/java
> -Xmx2g -Xms256m -XX:MaxPermSize=512m -cp
> /home/jenkins/jenkins-slave/maven32-agent.jar:/home/jenkins/jenkins-slave/tools/hudson.tasks.Maven_MavenInstallation/maven-3.3.3/boot/plexus-classworlds-2.5.2.jar:/home/jenkins/jenkins-slave/tools/hudson.tasks.Maven_MavenInstallation/maven-3.3.3/conf/logging
> jenkins.maven3.agent.Maven32Main
> /home/jenkins/jenkins-slave/tools/hudson.tasks.Maven_MavenInstallation/maven-3.3.3
> /home/jenkins/jenkins-slave/slave.jar
> /home/jenkins/jenkins-slave/maven32-interceptor.jar
> /home/jenkins/jenkins-slave/maven3-interceptor-commons.jar 54058
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option
> MaxPermSize=512m; support was removed in 8.0
> <===[JENKINS REMOTING CAPACITY]===>   channel started
> Executing Maven:  -B -f <
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/pom.xml>
> -Dmaven.repo.local=<
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/.repository>
> -B -e clean deploy
> [INFO] Error stacktraces are turned on.
> [INFO] Scanning for projects...
> [INFO]
> 
> [INFO] Reactor Build Order:
> [INFO]
> [INFO] Apache Beam :: Parent
> [INFO] Apache Beam :: SDKs
> [INFO] Apache Beam :: SDKs :: Java
> [INFO] Apache Beam :: SDKs :: Java :: Core
> [INFO] Apache Beam :: SDKs :: Java :: IO
> [INFO] Apache Beam :: SDKs :: Java :: IO :: HDFS
> [INFO] Apache Beam :: SDKs :: Java :: Tests
> [INFO] Apache Beam :: Runners
> [INFO] Apache Beam :: Runners :: Google Cloud Dataflow
> [INFO] Apache Beam :: Examples :: Java All
> [INFO] Apache Beam :: Runners :: Flink
> [INFO] Apache Beam :: Runners :: Flink :: Core
> [INFO] Apache Beam :: Runners :: Flink :: Examples
> [INFO] Apache Beam :: Runners :: Spark
> [INFO] Apache Beam :: SDKs :: Java :: Maven Archetypes
> [INFO] Apache Beam :: SDKs :: Java :: Maven Archetypes :: Starter
> [INFO] Apache Beam :: SDKs :: Java :: Maven Archetypes :: Examples
> [INFO] Apache Beam :: Examples :: Java 8 All
> [INFO] Apache Beam :: Examples
> [INFO]
> [INFO]
> 
> [INFO] Building Apache Beam :: Parent 0.1.0-incubating-SNAPSHOT
> [INFO]
> 
> [INFO]
> [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ parent ---
> [INFO] Deleting <
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ws/target>
> [INFO]
> [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ parent ---
> [INFO]
> [INFO

Re: Massive package renaming coming

2016-04-13 Thread Davor Bonaci
I've just merged a pull request that includes this project-wide renaming of
Java packages. (Long live Beam!)

At this point, pending pull requests may need to be rebased. We'll try to
fix the Cloud Dataflow runner as soon as possible, but it might take a
little bit to complete that.

There's still a lot left to re-organize in Beam, but I'd expect future
changes not to be this far-reaching.

A special thanks goes to Ben Chambers for pulling off several tricks to get
this done quickly and effectively.

On Tue, Apr 12, 2016 at 9:38 PM, Jean-Baptiste Onofré 
wrote:

> Hi Davor,
>
> +1 !
>
> I already updated the PullRequest for annotations package.
>
> Thanks !
> Regards
> JB
>
>
> On 04/13/2016 03:23 AM, Davor Bonaci wrote:
>
>> We are preparing to do a massive, project-wide package rename from
>> "com.google.cloud.dataflow" to "org.apache.beam". At the earliest, this
>> could occur sometime tomorrow afternoon (Pacific time).
>>
>> Unfortunately, there's no way to do this without affecting ongoing work.
>> We'll try to do it as quickly as possible to minimize such impact.
>>
>> We'll ensure that existing automated testing passes before merging the
>> change, with the exception of integration coverage with the Google Cloud
>> Dataflow service. We expect that the code in Beam's master will not work
>> against Cloud Dataflow for a little bit -- we'll accept this breakage on a
>> one-time basis, and try to recover it as soon as possible thereafter.
>>
>> If anybody sees any issues with this plan, I'd love to hear it.
>>
>> This is one of those mandatory things we've been delaying for a while now.
>> Of course, there are more such things to come, but hopefully none that are
>> this wide.
>>
>> Thanks!
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Massive package renaming coming

2016-04-12 Thread Davor Bonaci
We are preparing to do a massive, project-wide package rename from
"com.google.cloud.dataflow" to "org.apache.beam". At the earliest, this
could occur sometime tomorrow afternoon (Pacific time).

Unfortunately, there's no way to do this without affecting ongoing work.
We'll try to do it as quickly as possible to minimize such impact.

We'll ensure that existing automated testing passes before merging the
change, with the exception of integration coverage with the Google Cloud
Dataflow service. We expect that the code in Beam's master will not work
against Cloud Dataflow for a little bit -- we'll accept this breakage on a
one-time basis, and try to recover it as soon as possible thereafter.

If anybody sees any issues with this plan, I'd love to hear it.

This is one of those mandatory things we've been delaying for a while now.
Of course, there are more such things to come, but hopefully none that are
this wide.

Thanks!


Re: [PROPOSAL] Nightly builds by Jenkins

2016-04-08 Thread Davor Bonaci
This is already done, since it was trivial and INFRA was super-fast  ;)

I'll send a separate email with usage instructions.

On Fri, Apr 8, 2016 at 9:59 AM, Davor Bonaci  wrote:

> I was hoping we could get the nightly build even before that point.
>
> INFRA-11623 is asking for Nexus access for Beam.
>
> On Fri, Apr 8, 2016 at 9:03 AM, Jean-Baptiste Onofré 
> wrote:
>
>> As it seems that we have an agreement, once the renaming task is
>> complete, I will setup a nightly build on Jenkins.
>>
>> Thanks !
>> Regards
>> JB
>>
>>
>> On 04/05/2016 07:50 AM, Jean-Baptiste Onofré wrote:
>>
>>> Hi beamers,
>>>
>>> Now, on Jenkins, we have three jobs:
>>>
>>> - beam_PreCommit does a mvn clean verify for each opened PR
>>> - beam_MavenVerify does a mvn clean verify on master branch
>>> - beam_RunnableOnService_GoogleCloudDataflow does a mvn clean verify
>>> -PDataflowPipelineTests on master branch
>>>
>>> As discussed last week, Davor and I are working on renaming (especially
>>> package).
>>>
>>> Once this renaming done (it should take a week or so), I propose to
>>> change beam_MavenVerify as beam_Nightly: it will do a mvn clean deploy
>>> deploying SNAPSHOTs on the Apache SNAPSHOT repo (deploy phase includes
>>> verify and test of course) with a schedule every night and SCM change.
>>>
>>> It will allow people to test and try beam without building.
>>>
>>> Thoughts ?
>>>
>>> Regards
>>> JB
>>>
>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>
>


Re: [PROPOSAL] Nightly builds by Jenkins

2016-04-08 Thread Davor Bonaci
I was hoping we could get the nightly build even before that point.

INFRA-11623 is asking for Nexus access for Beam.

On Fri, Apr 8, 2016 at 9:03 AM, Jean-Baptiste Onofré 
wrote:

> As it seems that we have an agreement, once the renaming task is complete,
> I will setup a nightly build on Jenkins.
>
> Thanks !
> Regards
> JB
>
>
> On 04/05/2016 07:50 AM, Jean-Baptiste Onofré wrote:
>
>> Hi beamers,
>>
>> Now, on Jenkins, we have three jobs:
>>
>> - beam_PreCommit does a mvn clean verify for each opened PR
>> - beam_MavenVerify does a mvn clean verify on master branch
>> - beam_RunnableOnService_GoogleCloudDataflow does a mvn clean verify
>> -PDataflowPipelineTests on master branch
>>
>> As discussed last week, Davor and I are working on renaming (especially
>> package).
>>
>> Once this renaming done (it should take a week or so), I propose to
>> change beam_MavenVerify as beam_Nightly: it will do a mvn clean deploy
>> deploying SNAPSHOTs on the Apache SNAPSHOT repo (deploy phase includes
>> verify and test of course) with a schedule every night and SCM change.
>>
>> It will allow people to test and try beam without building.
>>
>> Thoughts ?
>>
>> Regards
>> JB
>>
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: [PROPOSAL] Releases schedule

2016-04-08 Thread Davor Bonaci
Sounds like a plan!

On Fri, Apr 8, 2016 at 9:34 AM, Jean-Baptiste Onofré 
wrote:

> Yes, agreed.
>
> The runners don't have to all update to Runner API on 0.2.0-incubating.
> They can do it on 0.3.0 or 0.4.0.
>
> Regards
> JB
>
>
> On 04/08/2016 06:30 PM, Amit Sela wrote:
>
>> +1 on first release. Maybe after 0.2.0-incubating we should give more time
>> for runner developers to adjust to new APIs, or consider this work as
>> ongoing during 0.3.0 and 0.4.0 (incubating).
>>
>> On Fri, Apr 8, 2016, 12:20 Lukasz Cwik  wrote:
>>
>> +1 for 0.1.0-incubating release
>>> unsure that progress and timelines will match up for the others
>>>
>>> On Fri, Apr 8, 2016 at 8:47 AM, Jean-Baptiste Onofré 
>>> wrote:
>>>
>>> Hi beamers !

 In order to give some visibility, I would like to propose the following
 releases schedule:

 0.1.0-incubating (on vote May 6th): first release with code cleanup and
 renaming (with only org.apache.beam packages). The purpose of this

>>> release
>>>
 is to test a first release, check the legal, and ask review from the
 Incubator PMC.

 0.2.0-incubating (on vote July 1st): adding the new Runner and IO APIs.
 Including new IOs and SDKs/DSLs.

 0.3.0-incubating (on vote July 29th): stabilization and bug fixing, new
 IOs and SDKs/DSLs. Breaking changes can be acceptable there.

 0.4.0-incubating (on vote August 26th): stabilization and bug fixing,
 new
 IOs and SDKs/DSLs.

 We can target graduation for September.

 WDYT ?

 Regards
 JB
 --
 Jean-Baptiste Onofré
 jbono...@apache.org
 http://blog.nanthrax.net
 Talend - http://www.talend.com


>>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: [PROPOSAL] Releases schedule

2016-04-08 Thread Davor Bonaci
I think it is very hard to commit to specific dates right now.

Certainly, the bar for the first release is set -- code cleanup and
renaming completed. There's plenty of uncertainty here -- unclear about the
specific date, but the ballpark sounds about right.

The new Runner / Fn API is the biggest piece of work -- unclear when this
may land, but the ballpark would probably be in early summer. In terms of
new SDKs/DSLs, no plans there. There'll be new IOs, Kafka at a minimum.

I think we should commit to the following:

   - Two releases during Q2:
  - The first release in early-to-mid May, with renaming completed.
  - The second release by the end June -- payload unclear.
   - A release every 1-2 months after that.
   - Incubator graduation in early Q4.
   - The first stable release in Q4, with a promise of future
   backward-compatibility from that point onward.

I think this is very much aligned with what JB proposed, but with slightly
different guarantees.

On Fri, Apr 8, 2016 at 9:20 AM, Lukasz Cwik 
wrote:

> +1 for 0.1.0-incubating release
> unsure that progress and timelines will match up for the others
>
> On Fri, Apr 8, 2016 at 8:47 AM, Jean-Baptiste Onofré 
> wrote:
>
> > Hi beamers !
> >
> > In order to give some visibility, I would like to propose the following
> > releases schedule:
> >
> > 0.1.0-incubating (on vote May 6th): first release with code cleanup and
> > renaming (with only org.apache.beam packages). The purpose of this
> release
> > is to test a first release, check the legal, and ask review from the
> > Incubator PMC.
> >
> > 0.2.0-incubating (on vote July 1st): adding the new Runner and IO APIs.
> > Including new IOs and SDKs/DSLs.
> >
> > 0.3.0-incubating (on vote July 29th): stabilization and bug fixing, new
> > IOs and SDKs/DSLs. Breaking changes can be acceptable there.
> >
> > 0.4.0-incubating (on vote August 26th): stabilization and bug fixing, new
> > IOs and SDKs/DSLs.
> >
> > We can target graduation for September.
> >
> > WDYT ?
> >
> > Regards
> > JB
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>


  1   2   >