Re: Committer?

2016-03-10 Thread Jean-Baptiste Onofré

Hi,

the code is on:

https://git-wip-us.apache.org/repos/asf?p=incubator-beam.git;a=summary

with a mirror (for PR) on github:

https://github.com/apache/incubator-beam

You don't ask to be a committer: you have to deserve to be one. So, you 
have to provide PR/patches, participate on the documentation & mailing 
lists, etc.


After a certain time (that can be long), you can be elected and proposed 
to be a new committer.


It's the way Apache work: meritocracy.

Regards
JB

On 03/10/2016 08:18 PM, Srikumar Chari wrote:

Hey there – am very excited about the Beam project.

Was wondering how can I get the latest updates, code?

Also, I was wondering if the team is looking for additional committers?
—
Thanks
Sri
Chief Architect, [24]7 Inc



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Committer?

2016-03-10 Thread Srikumar Chari
Hey there – am very excited about the Beam project.

Was wondering how can I get the latest updates, code?

Also, I was wondering if the team is looking for additional committers?
—
Thanks
Sri
Chief Architect, [24]7 Inc


Sorry for un-fixed-up PR merge

2016-03-10 Thread Kenneth Knowles
I want to apologize for leaving fixup commits in a PR merge I just
performed. I'm leaving as-is rather than mess about with `git push -f` to
rewrite a prettier history. Just don't want anyone to think that I would
normally go about like that.

Kenn


Re: Travis for pull requests

2016-03-10 Thread Kostas Kloudas
That is great news!

Thanks Davor!

> On Mar 10, 2016, at 10:20 AM, Amit Sela  wrote:
> 
> Thanks Davor!
> 
> On Thu, Mar 10, 2016, 11:15 Maximilian Michels  wrote:
> 
>> Well done :)
>> 
>> About the Flink tests in Jenkins: I wonder why they don't execute.
>> Just had a look at the Jenkins job. They seem to run fine:
>> 
>> https://builds.apache.org/job/beam_MavenVerify/35/org.apache.beam$flink-runner/console
>> 
>> On Thu, Mar 10, 2016 at 7:40 AM, Jean-Baptiste Onofré 
>> wrote:
>>> Awesome ! Thanks Davor.
>>> 
>>> Regards
>>> JB
>>> 
>>> 
>>> On 03/10/2016 01:10 AM, Davor Bonaci wrote:
 
 I'm happy to announce that we now have both Travis and Jenkins set up in
 Beam.
 
 Both systems are building our master branch. The most recent status is
 incorporated into the top-level README.md file. Clicking the badge will
 take you to the specific build results. Additionally, we have automatic
 coverage for each pull request, with results integrated into the GitHub
 pull request UI.
 
 Exciting!
 
 Low-level details:
 The systems aren't exactly equal. Travis will run on any branch, while
 Jenkins will run on master only. Travis will run multi-OS, multi-JDK
 version, while Jenkins does just one combination. Notifications to
>> Travis
 are pushed, Jenkins periodically polls for changes. Flink tests may not
>> be
 running in Jenkins right now -- we need to investigate why.
 
 On Wed, Mar 9, 2016 at 8:57 AM, Davor Bonaci  wrote:
 
> Sounds like we are all in agreement. Great!
> 
> On Wed, Mar 9, 2016 at 8:49 AM, Jean-Baptiste Onofré 
> wrote:
> 
>> I agree, and it's what I mean (assuming the signing is OK).
>> 
>> Basically, a release requires the following action:
>> 
>> - mvn release:prepare && mvn release:perform (with pgp signing, etc):
>> it
>> can be done by Jenkins, BUT it requires some credentials in
>> .m2/settings.xml (for signing and upload on nexus), etc. In lot of
>> Apache
>> projects, you have some guys dedicated for the releases, and a release
>> is
>> simply an unique command line to execute (or a procedure to follow)
>> - check the release content (human)
>> - close the staging repository on nexus (human)
>> - send the vote e-mail (human)
>> - once the vote passed:
>> -- promote the staging repo (human)
>> -- update Jira (human)
>> -- publish artifacts on dist.apache.org (human)
>> -- update reporter.apache.org (human)
>> -- send announcement e-mail on the mailing lists (human)
>> 
>> Regards
>> JB
>> 
>> 
>> On 03/09/2016 05:38 PM, Davor Bonaci wrote:
>> 
>>> I think a release manager (a person) should be driving it, but
>> his/her
>>> actions can still be automated through Jenkins. For example, a
>> Jenkins
>>> job
>>> that release manager manually triggers is often better than a set of
>>> manual
>>> command-line actions. Reasons: less error prone, repeatable, log of
>>> actions
>>> is kept and is visible to everyone, etc.
>>> 
>>> On Wed, Mar 9, 2016 at 1:25 AM, Jean-Baptiste Onofré <
>> j...@nanthrax.net>
>>> wrote:
>>> 
>>> Hi Max,
 
 
 I agree to use Jenkins for snapshots, but I don't think it's a good
 idea
 for release (it's better that a release manager does it IMHO).
 
 Regards
 JB
 
 
 On 03/09/2016 10:12 AM, Maximilian Michels wrote:
 
 I'm in favor of Travis too. We use it very extensively at Flink. It
>> is
> 
> true that Jenkins can provide a much more sophisticated workflow.
> However, its UI is outdated and it is not as nicely integrated with
> GitHub. For outside contributions, IMHO Travis is the best CI
>> system.
> 
> We might actually use Jenkins for releases or snapshot deployment.
> Jenkins is very flexible and nicely integrated with the ASF
> infrastructure which makes some things like providing credentials a
> piece of cake.
> 
> Thanks for getting us started @Davor.
> 
> On Tue, Mar 8, 2016 at 6:35 PM, Davor Bonaci
> > 
>> 
> wrote:
> 
> We absolutely could -- that's why we forked over Dataflow's Travis
>> 
>> configuration to start with. With Max's recent fixes to the Flink
>> runner,
>> this is very viable.
>> 
>> Travis vs. Jenkins is often a contentious discussion. Common
>> arguments
>> against Travis are: scalability / capacity, hard to schedule
>> periodic
>> runs,
>> and inability to automate the release process. There are many pros
>> too;

Re: Using beam sdk for standalone implementations without connecting to the cloud

2016-03-10 Thread Jean-Baptiste Onofré

Interesting, it makes sense.

Thanks for sharing !

Regards
JB

On 03/10/2016 10:32 AM, Minudika Malshan wrote:

Hi JB,

Thanks a lot for your kind attention. I'm very happy to take your advises
on this implementation. :)

I am planning to do this for GSOC 2016 since it has been published as a
project idea in this year.
Here is the plan in brief.

The user should be able to implement the pipelines using commands provided
by the beam sdk (dataflow sdk) using a zeppelin notebook.
Then the beam interpreter should be able to interpret and execute beam sdk
commands at the back-end and give the output.
Since beam provides only a sdk for java, I am going to use Java-REPL
 to interpret java commands
provided by sdk at the zeppelin back-end.

I will create a draft proposal for this implementation and share it with
you. Would like to have your comments on it.

Thanks and regards.
Minudika


Minudika Malshan
Undergraduate
Department of Computer Science and Engineering
University of Moratuwa
Sri Lanka.




On Thu, Mar 10, 2016 at 2:39 PM, Jean-Baptiste Onofré 
wrote:


Hi Minudika,

Oh, interesting for Zeppelin. What do you plan to do ? Implement the
zeppelin notebook backend with Beam (the zeppelin analytics would be
implemented as beam pipelines) ? I would be happy to help if you need.

Regards
JB


On 03/10/2016 09:47 AM, Minudika Malshan wrote:


Hi,

This is related with the implementation of a beam interpreter for Apache
zeppelin. I think for the first phase, DirectPipelineRunner will do the
job
:)
Please let me know if there is anything which can be helpful.

Thanks and regards.
Minudika

Minudika Malshan
Undergraduate
Department of Computer Science and Engineering
University of Moratuwa
Sri Lanka.




On Thu, Mar 10, 2016 at 12:11 PM, Jean-Baptiste Onofré 
wrote:

By the way, on my side, I will work on a Karaf/OSGi (

http://karaf.apache.org) runner for Beam (with shell commands, features,
etc).
I will start it just after the work on new IOs.

Regards
JB


On 03/09/2016 08:01 PM, Minudika Malshan wrote:

Hi,


Thanks a lot for your quick responses.
I will refer those resources.

Regards,
Minudika

Minudika Malshan
Undergraduate
Department of Computer Science and Engineering
University of Moratuwa
Sri Lanka.




On Thu, Mar 10, 2016 at 12:24 AM, Lukasz Cwik .

The FlinkPipelineRunner

which
can be used to execute locally or on a Flink cluster.

There is also ongoing work to bring Spark
 into the mix as a
runner
and
suggestions to for other runners such as GearPump
.

On Wed, Mar 9, 2016 at 10:37 AM, Minudika Malshan <
minudika...@gmail.com



wrote:


Hi all,



As per my knowledge about Apache beam and data flow sdk,  at the first

data


flow sdk has been developed targeting google cloud platform.

So we have to deploy pipelines in the cloud.

But my question is, can not we use this sdk for standalone

implementations


without cloud. If so, I would love to have a look at some examples of


such


implementations.

Your kind help is much appreciated.

Regards,
Minudika

Minudika Malshan
Undergraduate
Department of Computer Science and Engineering
University of Moratuwa
Sri Lanka.






--

Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Using beam sdk for standalone implementations without connecting to the cloud

2016-03-10 Thread Minudika Malshan
Hi JB,

Thanks a lot for your kind attention. I'm very happy to take your advises
on this implementation. :)

I am planning to do this for GSOC 2016 since it has been published as a
project idea in this year.
Here is the plan in brief.

The user should be able to implement the pipelines using commands provided
by the beam sdk (dataflow sdk) using a zeppelin notebook.
Then the beam interpreter should be able to interpret and execute beam sdk
commands at the back-end and give the output.
Since beam provides only a sdk for java, I am going to use Java-REPL
 to interpret java commands
provided by sdk at the zeppelin back-end.

I will create a draft proposal for this implementation and share it with
you. Would like to have your comments on it.

Thanks and regards.
Minudika


Minudika Malshan
Undergraduate
Department of Computer Science and Engineering
University of Moratuwa
Sri Lanka.




On Thu, Mar 10, 2016 at 2:39 PM, Jean-Baptiste Onofré 
wrote:

> Hi Minudika,
>
> Oh, interesting for Zeppelin. What do you plan to do ? Implement the
> zeppelin notebook backend with Beam (the zeppelin analytics would be
> implemented as beam pipelines) ? I would be happy to help if you need.
>
> Regards
> JB
>
>
> On 03/10/2016 09:47 AM, Minudika Malshan wrote:
>
>> Hi,
>>
>> This is related with the implementation of a beam interpreter for Apache
>> zeppelin. I think for the first phase, DirectPipelineRunner will do the
>> job
>> :)
>> Please let me know if there is anything which can be helpful.
>>
>> Thanks and regards.
>> Minudika
>>
>> Minudika Malshan
>> Undergraduate
>> Department of Computer Science and Engineering
>> University of Moratuwa
>> Sri Lanka.
>>
>>
>>
>>
>> On Thu, Mar 10, 2016 at 12:11 PM, Jean-Baptiste Onofré 
>> wrote:
>>
>> By the way, on my side, I will work on a Karaf/OSGi (
>>> http://karaf.apache.org) runner for Beam (with shell commands, features,
>>> etc).
>>> I will start it just after the work on new IOs.
>>>
>>> Regards
>>> JB
>>>
>>>
>>> On 03/09/2016 08:01 PM, Minudika Malshan wrote:
>>>
>>> Hi,

 Thanks a lot for your quick responses.
 I will refer those resources.

 Regards,
 Minudika

 Minudika Malshan
 Undergraduate
 Department of Computer Science and Engineering
 University of Moratuwa
 Sri Lanka.




 On Thu, Mar 10, 2016 at 12:24 AM, Lukasz Cwik 
 wrote:

 There are currently two implementations which do not require the cloud:

>
> The DirectPipelineRunner
> <
>
>
> https://github.com/apache/incubator-beam/blob/master/sdk/src/main/java/com/google/cloud/dataflow/sdk/runners/DirectPipelineRunner.java
>
>
>> which is mainly used for testing and local development. This runner
>> has
>>
> several limits (data size, no support for unbounded collections, ...)
> and
> is being expanded to support more use cases, for example adding
> unbounded
> PCollection support .
>
> The FlinkPipelineRunner
> 
> which
> can be used to execute locally or on a Flink cluster.
>
> There is also ongoing work to bring Spark
>  into the mix as a
> runner
> and
> suggestions to for other runners such as GearPump
> .
>
> On Wed, Mar 9, 2016 at 10:37 AM, Minudika Malshan <
> minudika...@gmail.com
>
>>
>> wrote:
>
> Hi all,
>
>>
>> As per my knowledge about Apache beam and data flow sdk,  at the first
>>
>> data
>
> flow sdk has been developed targeting google cloud platform.
>> So we have to deploy pipelines in the cloud.
>>
>> But my question is, can not we use this sdk for standalone
>>
>> implementations
>
> without cloud. If so, I would love to have a look at some examples of
>>
>> such
>
> implementations.
>> Your kind help is much appreciated.
>>
>> Regards,
>> Minudika
>>
>> Minudika Malshan
>> Undergraduate
>> Department of Computer Science and Engineering
>> University of Moratuwa
>> Sri Lanka.
>>
>>
>>
>
 --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Travis for pull requests

2016-03-10 Thread Amit Sela
Thanks Davor!

On Thu, Mar 10, 2016, 11:15 Maximilian Michels  wrote:

> Well done :)
>
> About the Flink tests in Jenkins: I wonder why they don't execute.
> Just had a look at the Jenkins job. They seem to run fine:
>
> https://builds.apache.org/job/beam_MavenVerify/35/org.apache.beam$flink-runner/console
>
> On Thu, Mar 10, 2016 at 7:40 AM, Jean-Baptiste Onofré 
> wrote:
> > Awesome ! Thanks Davor.
> >
> > Regards
> > JB
> >
> >
> > On 03/10/2016 01:10 AM, Davor Bonaci wrote:
> >>
> >> I'm happy to announce that we now have both Travis and Jenkins set up in
> >> Beam.
> >>
> >> Both systems are building our master branch. The most recent status is
> >> incorporated into the top-level README.md file. Clicking the badge will
> >> take you to the specific build results. Additionally, we have automatic
> >> coverage for each pull request, with results integrated into the GitHub
> >> pull request UI.
> >>
> >> Exciting!
> >>
> >> Low-level details:
> >> The systems aren't exactly equal. Travis will run on any branch, while
> >> Jenkins will run on master only. Travis will run multi-OS, multi-JDK
> >> version, while Jenkins does just one combination. Notifications to
> Travis
> >> are pushed, Jenkins periodically polls for changes. Flink tests may not
> be
> >> running in Jenkins right now -- we need to investigate why.
> >>
> >> On Wed, Mar 9, 2016 at 8:57 AM, Davor Bonaci  wrote:
> >>
> >>> Sounds like we are all in agreement. Great!
> >>>
> >>> On Wed, Mar 9, 2016 at 8:49 AM, Jean-Baptiste Onofré 
> >>> wrote:
> >>>
>  I agree, and it's what I mean (assuming the signing is OK).
> 
>  Basically, a release requires the following action:
> 
>  - mvn release:prepare && mvn release:perform (with pgp signing, etc):
> it
>  can be done by Jenkins, BUT it requires some credentials in
>  .m2/settings.xml (for signing and upload on nexus), etc. In lot of
>  Apache
>  projects, you have some guys dedicated for the releases, and a release
>  is
>  simply an unique command line to execute (or a procedure to follow)
>  - check the release content (human)
>  - close the staging repository on nexus (human)
>  - send the vote e-mail (human)
>  - once the vote passed:
>  -- promote the staging repo (human)
>  -- update Jira (human)
>  -- publish artifacts on dist.apache.org (human)
>  -- update reporter.apache.org (human)
>  -- send announcement e-mail on the mailing lists (human)
> 
>  Regards
>  JB
> 
> 
>  On 03/09/2016 05:38 PM, Davor Bonaci wrote:
> 
> > I think a release manager (a person) should be driving it, but
> his/her
> > actions can still be automated through Jenkins. For example, a
> Jenkins
> > job
> > that release manager manually triggers is often better than a set of
> > manual
> > command-line actions. Reasons: less error prone, repeatable, log of
> > actions
> > is kept and is visible to everyone, etc.
> >
> > On Wed, Mar 9, 2016 at 1:25 AM, Jean-Baptiste Onofré <
> j...@nanthrax.net>
> > wrote:
> >
> > Hi Max,
> >>
> >>
> >> I agree to use Jenkins for snapshots, but I don't think it's a good
> >> idea
> >> for release (it's better that a release manager does it IMHO).
> >>
> >> Regards
> >> JB
> >>
> >>
> >> On 03/09/2016 10:12 AM, Maximilian Michels wrote:
> >>
> >> I'm in favor of Travis too. We use it very extensively at Flink. It
> is
> >>>
> >>> true that Jenkins can provide a much more sophisticated workflow.
> >>> However, its UI is outdated and it is not as nicely integrated with
> >>> GitHub. For outside contributions, IMHO Travis is the best CI
> system.
> >>>
> >>> We might actually use Jenkins for releases or snapshot deployment.
> >>> Jenkins is very flexible and nicely integrated with the ASF
> >>> infrastructure which makes some things like providing credentials a
> >>> piece of cake.
> >>>
> >>> Thanks for getting us started @Davor.
> >>>
> >>> On Tue, Mar 8, 2016 at 6:35 PM, Davor Bonaci
> >>>  
> 
> >>> wrote:
> >>>
> >>> We absolutely could -- that's why we forked over Dataflow's Travis
> 
>  configuration to start with. With Max's recent fixes to the Flink
>  runner,
>  this is very viable.
> 
>  Travis vs. Jenkins is often a contentious discussion. Common
>  arguments
>  against Travis are: scalability / capacity, hard to schedule
>  periodic
>  runs,
>  and inability to automate the release process. There are many pros
>  too;
>  e.g., automatic coverage on forked repositories.
> 
>  We are generally in favor of doing this through Jenkins for the
> pull
>  requests, 

Re: Using beam sdk for standalone implementations without connecting to the cloud

2016-03-10 Thread Jean-Baptiste Onofré

Hi Minudika,

Oh, interesting for Zeppelin. What do you plan to do ? Implement the 
zeppelin notebook backend with Beam (the zeppelin analytics would be 
implemented as beam pipelines) ? I would be happy to help if you need.


Regards
JB

On 03/10/2016 09:47 AM, Minudika Malshan wrote:

Hi,

This is related with the implementation of a beam interpreter for Apache
zeppelin. I think for the first phase, DirectPipelineRunner will do the job
:)
Please let me know if there is anything which can be helpful.

Thanks and regards.
Minudika

Minudika Malshan
Undergraduate
Department of Computer Science and Engineering
University of Moratuwa
Sri Lanka.




On Thu, Mar 10, 2016 at 12:11 PM, Jean-Baptiste Onofré 
wrote:


By the way, on my side, I will work on a Karaf/OSGi (
http://karaf.apache.org) runner for Beam (with shell commands, features,
etc).
I will start it just after the work on new IOs.

Regards
JB


On 03/09/2016 08:01 PM, Minudika Malshan wrote:


Hi,

Thanks a lot for your quick responses.
I will refer those resources.

Regards,
Minudika

Minudika Malshan
Undergraduate
Department of Computer Science and Engineering
University of Moratuwa
Sri Lanka.




On Thu, Mar 10, 2016 at 12:24 AM, Lukasz Cwik 
wrote:

There are currently two implementations which do not require the cloud:


The DirectPipelineRunner
<

https://github.com/apache/incubator-beam/blob/master/sdk/src/main/java/com/google/cloud/dataflow/sdk/runners/DirectPipelineRunner.java



which is mainly used for testing and local development. This runner has

several limits (data size, no support for unbounded collections, ...) and
is being expanded to support more use cases, for example adding unbounded
PCollection support .

The FlinkPipelineRunner

which
can be used to execute locally or on a Flink cluster.

There is also ongoing work to bring Spark
 into the mix as a runner
and
suggestions to for other runners such as GearPump
.

On Wed, Mar 9, 2016 at 10:37 AM, Minudika Malshan 

Re: Travis for pull requests

2016-03-10 Thread Maximilian Michels
Well done :)

About the Flink tests in Jenkins: I wonder why they don't execute.
Just had a look at the Jenkins job. They seem to run fine:
https://builds.apache.org/job/beam_MavenVerify/35/org.apache.beam$flink-runner/console

On Thu, Mar 10, 2016 at 7:40 AM, Jean-Baptiste Onofré  wrote:
> Awesome ! Thanks Davor.
>
> Regards
> JB
>
>
> On 03/10/2016 01:10 AM, Davor Bonaci wrote:
>>
>> I'm happy to announce that we now have both Travis and Jenkins set up in
>> Beam.
>>
>> Both systems are building our master branch. The most recent status is
>> incorporated into the top-level README.md file. Clicking the badge will
>> take you to the specific build results. Additionally, we have automatic
>> coverage for each pull request, with results integrated into the GitHub
>> pull request UI.
>>
>> Exciting!
>>
>> Low-level details:
>> The systems aren't exactly equal. Travis will run on any branch, while
>> Jenkins will run on master only. Travis will run multi-OS, multi-JDK
>> version, while Jenkins does just one combination. Notifications to Travis
>> are pushed, Jenkins periodically polls for changes. Flink tests may not be
>> running in Jenkins right now -- we need to investigate why.
>>
>> On Wed, Mar 9, 2016 at 8:57 AM, Davor Bonaci  wrote:
>>
>>> Sounds like we are all in agreement. Great!
>>>
>>> On Wed, Mar 9, 2016 at 8:49 AM, Jean-Baptiste Onofré 
>>> wrote:
>>>
 I agree, and it's what I mean (assuming the signing is OK).

 Basically, a release requires the following action:

 - mvn release:prepare && mvn release:perform (with pgp signing, etc): it
 can be done by Jenkins, BUT it requires some credentials in
 .m2/settings.xml (for signing and upload on nexus), etc. In lot of
 Apache
 projects, you have some guys dedicated for the releases, and a release
 is
 simply an unique command line to execute (or a procedure to follow)
 - check the release content (human)
 - close the staging repository on nexus (human)
 - send the vote e-mail (human)
 - once the vote passed:
 -- promote the staging repo (human)
 -- update Jira (human)
 -- publish artifacts on dist.apache.org (human)
 -- update reporter.apache.org (human)
 -- send announcement e-mail on the mailing lists (human)

 Regards
 JB


 On 03/09/2016 05:38 PM, Davor Bonaci wrote:

> I think a release manager (a person) should be driving it, but his/her
> actions can still be automated through Jenkins. For example, a Jenkins
> job
> that release manager manually triggers is often better than a set of
> manual
> command-line actions. Reasons: less error prone, repeatable, log of
> actions
> is kept and is visible to everyone, etc.
>
> On Wed, Mar 9, 2016 at 1:25 AM, Jean-Baptiste Onofré 
> wrote:
>
> Hi Max,
>>
>>
>> I agree to use Jenkins for snapshots, but I don't think it's a good
>> idea
>> for release (it's better that a release manager does it IMHO).
>>
>> Regards
>> JB
>>
>>
>> On 03/09/2016 10:12 AM, Maximilian Michels wrote:
>>
>> I'm in favor of Travis too. We use it very extensively at Flink. It is
>>>
>>> true that Jenkins can provide a much more sophisticated workflow.
>>> However, its UI is outdated and it is not as nicely integrated with
>>> GitHub. For outside contributions, IMHO Travis is the best CI system.
>>>
>>> We might actually use Jenkins for releases or snapshot deployment.
>>> Jenkins is very flexible and nicely integrated with the ASF
>>> infrastructure which makes some things like providing credentials a
>>> piece of cake.
>>>
>>> Thanks for getting us started @Davor.
>>>
>>> On Tue, Mar 8, 2016 at 6:35 PM, Davor Bonaci
>>> >> wrote:
>>>
>>> We absolutely could -- that's why we forked over Dataflow's Travis

 configuration to start with. With Max's recent fixes to the Flink
 runner,
 this is very viable.

 Travis vs. Jenkins is often a contentious discussion. Common
 arguments
 against Travis are: scalability / capacity, hard to schedule
 periodic
 runs,
 and inability to automate the release process. There are many pros
 too;
 e.g., automatic coverage on forked repositories.

 We are generally in favor of doing this through Jenkins for the pull
 requests, since that is our "official" CI. Many projects do this --
 Apache
 Thrift is one example [1]. Work on this is in-progress on our side.

 Maintaining both systems is an extra burden, but I feel we'll end up
 there
 sooner or later. Thus, I'm also in favor of enabling the coverage
 that we
 already 

Re: Using beam sdk for standalone implementations without connecting to the cloud

2016-03-10 Thread Minudika Malshan
Hi,

This is related with the implementation of a beam interpreter for Apache
zeppelin. I think for the first phase, DirectPipelineRunner will do the job
:)
Please let me know if there is anything which can be helpful.

Thanks and regards.
Minudika

Minudika Malshan
Undergraduate
Department of Computer Science and Engineering
University of Moratuwa
Sri Lanka.




On Thu, Mar 10, 2016 at 12:11 PM, Jean-Baptiste Onofré 
wrote:

> By the way, on my side, I will work on a Karaf/OSGi (
> http://karaf.apache.org) runner for Beam (with shell commands, features,
> etc).
> I will start it just after the work on new IOs.
>
> Regards
> JB
>
>
> On 03/09/2016 08:01 PM, Minudika Malshan wrote:
>
>> Hi,
>>
>> Thanks a lot for your quick responses.
>> I will refer those resources.
>>
>> Regards,
>> Minudika
>>
>> Minudika Malshan
>> Undergraduate
>> Department of Computer Science and Engineering
>> University of Moratuwa
>> Sri Lanka.
>>
>>
>>
>>
>> On Thu, Mar 10, 2016 at 12:24 AM, Lukasz Cwik 
>> wrote:
>>
>> There are currently two implementations which do not require the cloud:
>>>
>>> The DirectPipelineRunner
>>> <
>>>
>>> https://github.com/apache/incubator-beam/blob/master/sdk/src/main/java/com/google/cloud/dataflow/sdk/runners/DirectPipelineRunner.java
>>>

 which is mainly used for testing and local development. This runner has
>>> several limits (data size, no support for unbounded collections, ...) and
>>> is being expanded to support more use cases, for example adding unbounded
>>> PCollection support .
>>>
>>> The FlinkPipelineRunner
>>> 
>>> which
>>> can be used to execute locally or on a Flink cluster.
>>>
>>> There is also ongoing work to bring Spark
>>>  into the mix as a runner
>>> and
>>> suggestions to for other runners such as GearPump
>>> .
>>>
>>> On Wed, Mar 9, 2016 at 10:37 AM, Minudika Malshan >> >
>>> wrote:
>>>
>>> Hi all,

 As per my knowledge about Apache beam and data flow sdk,  at the first

>>> data
>>>
 flow sdk has been developed targeting google cloud platform.
 So we have to deploy pipelines in the cloud.

 But my question is, can not we use this sdk for standalone

>>> implementations
>>>
 without cloud. If so, I would love to have a look at some examples of

>>> such
>>>
 implementations.
 Your kind help is much appreciated.

 Regards,
 Minudika

 Minudika Malshan
 Undergraduate
 Department of Computer Science and Engineering
 University of Moratuwa
 Sri Lanka.


>>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>