Re: [PROPOSAL] IRC or slack channel for Apache Beam

2016-05-18 Thread Kam Kasravi
+1 slack

On Wednesday, May 18, 2016, Manu Zhang  wrote:

> +1 for slack.
>
> On Wed, May 18, 2016 at 4:41 PM Jean-Baptiste Onofré  >
> wrote:
>
> > Good point Robert.
> >
> > I will be on the channel for sure (I'm already on bunch of Apache IRC
> > channels ;)).
> >
> > Regards
> > JB
> >
> > On 05/18/2016 10:26 AM, Robert Bradshaw wrote:
> > > The value in such a channel is highly dependent on people regularly
> > > being there--do we have a critical mass of developers that would hang
> > > out there? If so, I'd say go for it.
> > >
> > > On Wed, May 18, 2016 at 12:51 AM, Amit Sela  >
> > wrote:
> > >> +1 for Slack
> > >>
> > >> On Wed, May 18, 2016 at 10:47 AM Jean-Baptiste Onofré <
> j...@nanthrax.net >
> > >> wrote:
> > >>
> > >>> Hi all,
> > >>>
> > >>> What do you think about creating a #apache-beam IRC channel on
> freenode
> > >>> ? Or if it's more convenient a channel on Slack ?
> > >>>
> > >>> Regards
> > >>> JB
> > >>> --
> > >>> Jean-Baptiste Onofré
> > >>> jbono...@apache.org 
> > >>> http://blog.nanthrax.net
> > >>> Talend - http://www.talend.com
> > >>>
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org 
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>


Re: [DISCUSS] Developing new components -- branches, maturity, and committers

2016-05-18 Thread Seetharam Venkatesh
+1, this is a step in the right direction.

On Wed, May 18, 2016 at 10:02 PM Frances Perry 
wrote:

> Hi Beamers --
>
> I’m thrilled by the recent energy and activity on writing new Beam runners!
> But that also means it’s probably time for us to figure out how, as a
> community, we want to support this process. ;-)
>
> Back near the beginning, we had a thread [1] discussing that feature
> branches are the preferred way of doing development of features or
> components that may take a while to reach maturity. I think new components
> like runners and SDKs meet the bar to be started from a feature branch.
> (Other features, like an IO connector or library of PTransforms, might also
> qualify depending on complexity.)
>
> We should also lay out what it takes to be considered mature enough to be
> merged into master, since once that happens the component gets released to
> users and failing tests become blocking issues. Here are some initial
> thoughts to kick off the discussion...
>
> In order to be merged into master, new components / major features should:
>
>-
>
>have at least 2 contributors interested in maintaining it, and 1
>committer interested in supporting it
>-
>
>provide both end-user and developer-facing documentation
>-
>
>have at least a basic level of unit test coverage
>-
>
>run all existing applicable integration tests with other Beam components
>and create additional tests as appropriate
>
>
> In addition...
>
> A runner should:
>
>-
>
>be able to handle a subset of the model that address a significant set
>of use cases (aka. ‘traditional batch’ or ‘processing time streaming’)
>-
>
>update the capability matrix with the current status
>
>
> An SDK* should:
>
>-
>
>provide the ability to construct graphs with all the basic building
>blocks of the model (ParDo, GroupByKey, Window, Trigger, etc)
>-
>
>begin fleshing out the common composite transforms (Count, Join, etc)
>and IO connectors (Text, Kafka, etc)
>-
>
>have at least one runner that can execute the complete model (may be a
>direct runner)
>-
>
>provide integration tests for executing against current and future
>runners
>
>
> * A note on DSLs:  I think it’s important to separate out an SDK from a
> DSL, because in my mind the former is by definition equivalent to the Beam
> model, while the latter may select portions of the model or change the
> user-visible abstractions in order to provide a domain-specific experience.
> We may want to encourage some DSLs to live separately from Beam because
> they may look completely non-Beam-like to their end users. But we can
> probably punt this decision until we have concrete examples to discuss.
>
> Another fun part of this growth is that we’ll likely grow new committers.
> And given the breadth of Beam, I think it would be useful to annotate our
> committers [2] page with which components folks are the most knowledgeable
> about.
>
> Looking forward to your thoughts.
>
> [1]
>
> http://mail-archives.apache.org/mod_mbox/incubator-beam-dev/201602.mbox/%3CCAAzyFAymVNpjQgZdz2BoMknnE3H9rYRbdnUemamt9Pavw8ugsw%40mail.gmail.com%3E
>
> [2] http://beam.incubator.apache.org/team/
>


Dynamic work rebalancing for Beam

2016-05-18 Thread Dan Halperin
Hey folks,

This morning, my colleagues Eugene & Malo posted *No shard left behind:
dynamic work rebalancing in Google Cloud Dataflow
*.
This article discusses Cloud Dataflow’s solution to the well-known
straggler problem.

In a large batch processing job with many tasks executing in parallel, some
of the tasks – the stragglers – can take a much longer time to complete
than others, perhaps due to imperfect splitting of the work into parallel
chunks when issuing the job. Typically, waiting for stragglers means that
the overall job completes later than it should, and may also reserve too
many machines that may be underutilized at the end. Cloud Dataflow’s
dynamic work rebalancing can mitigate stragglers in most cases.

What I’d like to highlight for the Apache Beam (incubating) community is
that Cloud Dataflow’s dynamic work rebalancing is implemented using
*runner-specific* control logic on top of Beam’s *runner-independent*
BoundedSource
API
.
Specifically, to steal work from a straggler, a runner need only call the
reader’s splitAtFraction method. This will generate a new source containing
leftover work, and then the runner can pass that source off to another idle
worker. As Beam matures, I hope that other runners are interested in
figuring out whether these APIs can help them improve performance,
implementing dynamic work rebalancing, and collaborating on API changes
that will help solve other pain points.

Dan

(Also posted on Beam blog:
http://beam.incubator.apache.org/blog/2016/05/18/splitAtFraction-method.html
)


[DISCUSS] Developing new components -- branches, maturity, and committers

2016-05-18 Thread Frances Perry
Hi Beamers --

I’m thrilled by the recent energy and activity on writing new Beam runners!
But that also means it’s probably time for us to figure out how, as a
community, we want to support this process. ;-)

Back near the beginning, we had a thread [1] discussing that feature
branches are the preferred way of doing development of features or
components that may take a while to reach maturity. I think new components
like runners and SDKs meet the bar to be started from a feature branch.
(Other features, like an IO connector or library of PTransforms, might also
qualify depending on complexity.)

We should also lay out what it takes to be considered mature enough to be
merged into master, since once that happens the component gets released to
users and failing tests become blocking issues. Here are some initial
thoughts to kick off the discussion...

In order to be merged into master, new components / major features should:

   -

   have at least 2 contributors interested in maintaining it, and 1
   committer interested in supporting it
   -

   provide both end-user and developer-facing documentation
   -

   have at least a basic level of unit test coverage
   -

   run all existing applicable integration tests with other Beam components
   and create additional tests as appropriate


In addition...

A runner should:

   -

   be able to handle a subset of the model that address a significant set
   of use cases (aka. ‘traditional batch’ or ‘processing time streaming’)
   -

   update the capability matrix with the current status


An SDK* should:

   -

   provide the ability to construct graphs with all the basic building
   blocks of the model (ParDo, GroupByKey, Window, Trigger, etc)
   -

   begin fleshing out the common composite transforms (Count, Join, etc)
   and IO connectors (Text, Kafka, etc)
   -

   have at least one runner that can execute the complete model (may be a
   direct runner)
   -

   provide integration tests for executing against current and future
   runners


* A note on DSLs:  I think it’s important to separate out an SDK from a
DSL, because in my mind the former is by definition equivalent to the Beam
model, while the latter may select portions of the model or change the
user-visible abstractions in order to provide a domain-specific experience.
We may want to encourage some DSLs to live separately from Beam because
they may look completely non-Beam-like to their end users. But we can
probably punt this decision until we have concrete examples to discuss.

Another fun part of this growth is that we’ll likely grow new committers.
And given the breadth of Beam, I think it would be useful to annotate our
committers [2] page with which components folks are the most knowledgeable
about.

Looking forward to your thoughts.

[1]
http://mail-archives.apache.org/mod_mbox/incubator-beam-dev/201602.mbox/%3CCAAzyFAymVNpjQgZdz2BoMknnE3H9rYRbdnUemamt9Pavw8ugsw%40mail.gmail.com%3E

[2] http://beam.incubator.apache.org/team/


Re: [PROPOSAL] IRC or slack channel for Apache Beam

2016-05-18 Thread Manu Zhang
+1 for slack.

On Wed, May 18, 2016 at 4:41 PM Jean-Baptiste Onofré 
wrote:

> Good point Robert.
>
> I will be on the channel for sure (I'm already on bunch of Apache IRC
> channels ;)).
>
> Regards
> JB
>
> On 05/18/2016 10:26 AM, Robert Bradshaw wrote:
> > The value in such a channel is highly dependent on people regularly
> > being there--do we have a critical mass of developers that would hang
> > out there? If so, I'd say go for it.
> >
> > On Wed, May 18, 2016 at 12:51 AM, Amit Sela 
> wrote:
> >> +1 for Slack
> >>
> >> On Wed, May 18, 2016 at 10:47 AM Jean-Baptiste Onofré 
> >> wrote:
> >>
> >>> Hi all,
> >>>
> >>> What do you think about creating a #apache-beam IRC channel on freenode
> >>> ? Or if it's more convenient a channel on Slack ?
> >>>
> >>> Regards
> >>> JB
> >>> --
> >>> Jean-Baptiste Onofré
> >>> jbono...@apache.org
> >>> http://blog.nanthrax.net
> >>> Talend - http://www.talend.com
> >>>
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: [UPDATE] 0.1-incubating version created in Jira

2016-05-18 Thread Jean-Baptiste Onofré

Thanks for the version number fix.

So, it means that later, if an user will check Jira, he will have to 
know that blank means "fixed in 0.1.0-incubating".


If we agree about that, that's fine (I prefer explicit fixed version, 
but I don't mind ;)).


Regards
JB

On 05/18/2016 06:05 PM, Davor Bonaci wrote:

The version number is actually 0.1.0-incubating. Updated accordingly.

Release notes aren't particularly useful for the first release, since
there's no baseline established. We can just have a blank statement saying
"first release" or something similar.

On Wed, May 18, 2016 at 8:19 AM, Jean-Baptiste Onofré 
wrote:


Hi all,

I created 0.1-incubating version in Jira.

As we are going to use it to generate the RELEASE NOTES, it would be great
if you can review the open Jira and define the fix version field if you
want to include it in first 0.1-incubating release.

I plan to do a bulk change to assign this fixed version for all resolved
issues.

Thanks !
Regards
JB
--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com





--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Build failed in Jenkins: beam_Release_NightlySnapshot #45

2016-05-18 Thread Apache Jenkins Server
See 

Changes:

[mxm] [flink] replace obsolete reflection call

[dhalperi] Pub/sub unbounded source

--
[...truncated 3875 lines...]
[INFO] Uploaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/beam/java8tests-all/0.1.0-incubating-SNAPSHOT/java8tests-all-0.1.0-incubating-20160518.071006-40-tests.jar
 (40 KB at 59.8 KB/sec)
[INFO] Uploading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/beam/java8tests-all/0.1.0-incubating-SNAPSHOT/maven-metadata.xml
[INFO] Uploaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/beam/java8tests-all/0.1.0-incubating-SNAPSHOT/maven-metadata.xml
 (2 KB at 2.4 KB/sec)
[INFO] 
[INFO] 
[INFO] Building Apache Beam :: Runners 0.1.0-incubating-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ runners-parent ---
[INFO] Deleting 

[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce) @ runners-parent ---
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ runners-parent 
---
[INFO] 
[INFO] --- maven-site-plugin:3.4:attach-descriptor (attach-descriptor) @ 
runners-parent ---
[INFO] 
[INFO] --- maven-install-plugin:2.5.2:install (default-install) @ 
runners-parent ---
[INFO] Installing 
 
to 

[INFO] 
[INFO] --- maven-deploy-plugin:2.8.2:deploy (default-deploy) @ runners-parent 
---
[INFO] Downloading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/beam/runners-parent/0.1.0-incubating-SNAPSHOT/maven-metadata.xml
[INFO] Downloaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/beam/runners-parent/0.1.0-incubating-SNAPSHOT/maven-metadata.xml
 (630 B at 1.0 KB/sec)
[INFO] Uploading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/beam/runners-parent/0.1.0-incubating-SNAPSHOT/runners-parent-0.1.0-incubating-20160518.071208-40.pom
[INFO] Uploaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/beam/runners-parent/0.1.0-incubating-SNAPSHOT/runners-parent-0.1.0-incubating-20160518.071208-40.pom
 (3 KB at 4.9 KB/sec)
[INFO] Downloading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/beam/runners-parent/maven-metadata.xml
[INFO] Downloaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/beam/runners-parent/maven-metadata.xml
 (300 B at 0.7 KB/sec)
[INFO] Uploading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/beam/runners-parent/0.1.0-incubating-SNAPSHOT/maven-metadata.xml
[INFO] Uploaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/beam/runners-parent/0.1.0-incubating-SNAPSHOT/maven-metadata.xml
 (630 B at 1.2 KB/sec)
[INFO] Uploading: 
https://repository.apache.org/content/repositories/snapshots/org/apache/beam/runners-parent/maven-metadata.xml
[INFO] Uploaded: 
https://repository.apache.org/content/repositories/snapshots/org/apache/beam/runners-parent/maven-metadata.xml
 (300 B at 0.4 KB/sec)
[INFO] 
[INFO] 
[INFO] Building Apache Beam :: Runners :: Google Cloud Dataflow 
0.1.0-incubating-SNAPSHOT
[INFO] 
[INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/google/apis/google-api-services-dataflow/v1b3-rev26-1.22.0/google-api-services-dataflow-v1b3-rev26-1.22.0.pom
[INFO] I/O exception (java.net.SocketException) caught when processing request 
to {s}->https://repo.maven.apache.org:443: Connection reset
[INFO] Retrying request to {s}->https://repo.maven.apache.org:443
[INFO] I/O exception (java.net.SocketException) caught when processing request 
to {s}->https://repo.maven.apache.org:443: Connection reset
[INFO] Retrying request to {s}->https://repo.maven.apache.org:443
[INFO] I/O exception (java.net.SocketException) caught when processing request 
to {s}->https://repo.maven.apache.org:443: Connection reset
[INFO] Retrying request to {s}->https://repo.maven.apache.org:443
[INFO] Downloading: 
https://repo.maven.apache.org/maven2/com/google/apis/google-api-services-clouddebugger/v2-rev8-1.22.0/google-api-services-clouddebugger-v2-rev8-1.22.0.pom
[INFO] I/O exception (java

[UPDATE] 0.1-incubating version created in Jira

2016-05-18 Thread Jean-Baptiste Onofré

Hi all,

I created 0.1-incubating version in Jira.

As we are going to use it to generate the RELEASE NOTES, it would be 
great if you can review the open Jira and define the fix version field 
if you want to include it in first 0.1-incubating release.


I plan to do a bulk change to assign this fixed version for all resolved 
issues.


Thanks !
Regards
JB
--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [PROPOSAL] IRC or slack channel for Apache Beam

2016-05-18 Thread Jean-Baptiste Onofré

Good point Robert.

I will be on the channel for sure (I'm already on bunch of Apache IRC 
channels ;)).


Regards
JB

On 05/18/2016 10:26 AM, Robert Bradshaw wrote:

The value in such a channel is highly dependent on people regularly
being there--do we have a critical mass of developers that would hang
out there? If so, I'd say go for it.

On Wed, May 18, 2016 at 12:51 AM, Amit Sela  wrote:

+1 for Slack

On Wed, May 18, 2016 at 10:47 AM Jean-Baptiste Onofré 
wrote:


Hi all,

What do you think about creating a #apache-beam IRC channel on freenode
? Or if it's more convenient a channel on Slack ?

Regards
JB
--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [PROPOSAL] IRC or slack channel for Apache Beam

2016-05-18 Thread Robert Bradshaw
The value in such a channel is highly dependent on people regularly
being there--do we have a critical mass of developers that would hang
out there? If so, I'd say go for it.

On Wed, May 18, 2016 at 12:51 AM, Amit Sela  wrote:
> +1 for Slack
>
> On Wed, May 18, 2016 at 10:47 AM Jean-Baptiste Onofré 
> wrote:
>
>> Hi all,
>>
>> What do you think about creating a #apache-beam IRC channel on freenode
>> ? Or if it's more convenient a channel on Slack ?
>>
>> Regards
>> JB
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>


Re: [PROPOSAL] IRC or slack channel for Apache Beam

2016-05-18 Thread Amit Sela
+1 for Slack

On Wed, May 18, 2016 at 10:47 AM Jean-Baptiste Onofré 
wrote:

> Hi all,
>
> What do you think about creating a #apache-beam IRC channel on freenode
> ? Or if it's more convenient a channel on Slack ?
>
> Regards
> JB
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


[PROPOSAL] IRC or slack channel for Apache Beam

2016-05-18 Thread Jean-Baptiste Onofré

Hi all,

What do you think about creating a #apache-beam IRC channel on freenode 
? Or if it's more convenient a channel on Slack ?


Regards
JB
--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com