Re: Scio Beam Scala API

2016-12-11 Thread Jean-Baptiste Onofré

Hi guys,

I will share the branch with the 0.4.0-incubating-SNAPSHOT update and 
tested with Spark runner.


Regards
JB

On 12/11/2016 09:13 PM, Neville Li wrote:

Aviem,

You're right we have a scio/apache-beam
 branch that works with
Beam 0.2.0-incubating and are working on keeping it up with the latest
releases.

A few ways you can contribute:
- It runs with the Dataflow runner but hasn't been tested with other
runners. You're welcome to give it a try and submit issues/PRs.
- `scio-core` is somewhat coupled with Dataflow runner and GCP IO
dependencies right now but it'd be nice to further decouple them so users
can swap other runner/IO packages easily.
- We also have a master ticket #279
 that keeps track of pending
issues for Beam migration.

Keep in mind that our team of 3 supports 150+ production Scio users within
Spotify so we simply don't have the bandwidth to maintain 2 diverging repo
(spotify/scio vs apache/beam-incubating) right now. We'll probably revisit
this when internal users switch over to Beam sometime next year.


On Sun, Dec 11, 2016 at 4:45 PM Jean-Baptiste Onofré 
wrote:


Hi

I'm working on a feature branch with Neville and his guys. I already
updated to last changes. I would like to propose a feature branch later
this week.

Regards
JB⁣​

On Dec 11, 2016, 16:39, at 16:39, Aviem Zur  wrote:

Hi,

I've heard there has been work towards porting Scio Dataflow Scala API
to
Beam.
I was wondering at what stage this is in, where is this happening (Saw
no
branch in BEAM repository, and one in Scio repository that is dependent
on
beam 0.2.0-INCUBATING) and if there is a way to contribute?






--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: New DoFn and WindowedValue/WinowingInternals

2016-12-11 Thread Eugene Kirpichov
Hi Amit,

Yes, this is correct. Part of the motivation for this is that DoFn API is
user-facing, and the compressed representation of windowed elements (e.g.
access to all windows of an element), as well as the ability to emit
directly into a specified window, is an implementation detail of the runner
that is dangerous to expose to SDK users (even I got burnt by it while
working on SplittableParDo), so we would like to move WindowedValue into
runners-core and keep the semantically clean API in the SDK: access to the
current window, and assigning windows via Window.into().

On Sun, Dec 11, 2016 at 11:59 AM Kenneth Knowles 
wrote:

> You've got it right. My recommendations is to just directly implement it
> for the Spark runner. It will often actually clean things up a bit. Here's
> the analogous change for the Flink runner:
> https://github.com/apache/incubator-beam/pull/1435/files.
>
> With GABW, I tried going through the process of keeping some utility
> expansion in runners-core, making StateInternalsFactory, refactoring
> GroupAlsoByWindowsDoFn, then GroupByKeyViaGroupByKeyOnly,
> GroupAlsoByWindow. But it ended up simpler for each runner to just not use
> most of that and do it directly. (they all still share GABW but none of the
> surrounding bits, IIRC)
>
> On Sun, Dec 11, 2016 at 10:33 AM, Amit Sela  wrote:
>
> > So basically using new DoFn with *SplittableParDo*, *Window.Bound*, and
> > *GroupAlsoByWindow* requires a custom implementation by per runner as
> they
> > are not handled by DoFn anymore, right ?
> >
> > On Sun, Dec 11, 2016 at 3:42 PM Eugene Kirpichov
> >  wrote:
> >
> > > Hi Amit, I'll comment in more detail later, but meanwhile please take a
> > > look at https://github.com/apache/incubator-beam/pull/1565
> > > There is a small amount of relevant changes to spark runner.
> > > Take a look at implementation of SplittableParDo (already committed) in
> > > particular ProcessFn and it's usage in direct runner - this is exactly
> > what
> > > you're looking for, a new DoFn that with per-runner support is able to
> > emit
> > > multi-windowed values.
> > > On Sun, Dec 11, 2016 at 4:28 AM Amit Sela 
> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I've been working on migrating the Spark runner to new DoFn and I've
> > > > stumbled upon a couple of cases where OldDoFn is used in a way that
> > > > accessed windowInternals (outputWindowedValue) such as
> > AssignWindowsDoFn.
> > > >
> > > > Since changing windows is no longer the responsibility of DoFn I was
> > > > wondering who and how is this done.
> > > >
> > > > Thanks,
> > > > Amit
> > > >
> > >
> >
>


Re: Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Spark #385

2016-12-11 Thread Jason Kuster
Filed https://issues.apache.org/jira/browse/BEAM-1130

On Sun, Dec 11, 2016 at 4:52 PM, Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> See  RunnableOnService_Spark/changes>
>
>


-- 
---
Jason Kuster
Apache Beam (Incubating) / Google Cloud Dataflow


Re: Jenkins build is still unstable: beam_PostCommit_Java_MavenInstall #2066

2016-12-11 Thread Jason Kuster
Filed https://issues.apache.org/jira/browse/BEAM-1130 for breaks here and
in spark RoS tests

On Sun, Dec 11, 2016 at 5:00 PM, Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> See  MavenInstall/changes>
>
>


-- 
---
Jason Kuster
Apache Beam (Incubating) / Google Cloud Dataflow


Re: Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #1787

2016-12-11 Thread Davor Bonaci
>
> Is there any way to retry staging if it fails?
>>
>
I believe the code already does this, but Pei would know for sure.


Re: Scio Beam Scala API

2016-12-11 Thread Neville Li
Aviem,

You're right we have a scio/apache-beam
 branch that works with
Beam 0.2.0-incubating and are working on keeping it up with the latest
releases.

A few ways you can contribute:
- It runs with the Dataflow runner but hasn't been tested with other
runners. You're welcome to give it a try and submit issues/PRs.
- `scio-core` is somewhat coupled with Dataflow runner and GCP IO
dependencies right now but it'd be nice to further decouple them so users
can swap other runner/IO packages easily.
- We also have a master ticket #279
 that keeps track of pending
issues for Beam migration.

Keep in mind that our team of 3 supports 150+ production Scio users within
Spotify so we simply don't have the bandwidth to maintain 2 diverging repo
(spotify/scio vs apache/beam-incubating) right now. We'll probably revisit
this when internal users switch over to Beam sometime next year.


On Sun, Dec 11, 2016 at 4:45 PM Jean-Baptiste Onofré 
wrote:

> Hi
>
> I'm working on a feature branch with Neville and his guys. I already
> updated to last changes. I would like to propose a feature branch later
> this week.
>
> Regards
> JB⁣​
>
> On Dec 11, 2016, 16:39, at 16:39, Aviem Zur  wrote:
> >Hi,
> >
> >I've heard there has been work towards porting Scio Dataflow Scala API
> >to
> >Beam.
> >I was wondering at what stage this is in, where is this happening (Saw
> >no
> >branch in BEAM repository, and one in Scio repository that is dependent
> >on
> >beam 0.2.0-INCUBATING) and if there is a way to contribute?
>


Re: New DoFn and WindowedValue/WinowingInternals

2016-12-11 Thread Kenneth Knowles
You've got it right. My recommendations is to just directly implement it
for the Spark runner. It will often actually clean things up a bit. Here's
the analogous change for the Flink runner:
https://github.com/apache/incubator-beam/pull/1435/files.

With GABW, I tried going through the process of keeping some utility
expansion in runners-core, making StateInternalsFactory, refactoring
GroupAlsoByWindowsDoFn, then GroupByKeyViaGroupByKeyOnly,
GroupAlsoByWindow. But it ended up simpler for each runner to just not use
most of that and do it directly. (they all still share GABW but none of the
surrounding bits, IIRC)

On Sun, Dec 11, 2016 at 10:33 AM, Amit Sela  wrote:

> So basically using new DoFn with *SplittableParDo*, *Window.Bound*, and
> *GroupAlsoByWindow* requires a custom implementation by per runner as they
> are not handled by DoFn anymore, right ?
>
> On Sun, Dec 11, 2016 at 3:42 PM Eugene Kirpichov
>  wrote:
>
> > Hi Amit, I'll comment in more detail later, but meanwhile please take a
> > look at https://github.com/apache/incubator-beam/pull/1565
> > There is a small amount of relevant changes to spark runner.
> > Take a look at implementation of SplittableParDo (already committed) in
> > particular ProcessFn and it's usage in direct runner - this is exactly
> what
> > you're looking for, a new DoFn that with per-runner support is able to
> emit
> > multi-windowed values.
> > On Sun, Dec 11, 2016 at 4:28 AM Amit Sela  wrote:
> >
> > > Hi all,
> > >
> > > I've been working on migrating the Spark runner to new DoFn and I've
> > > stumbled upon a couple of cases where OldDoFn is used in a way that
> > > accessed windowInternals (outputWindowedValue) such as
> AssignWindowsDoFn.
> > >
> > > Since changing windows is no longer the responsibility of DoFn I was
> > > wondering who and how is this done.
> > >
> > > Thanks,
> > > Amit
> > >
> >
>


Re: New DoFn and WindowedValue/WinowingInternals

2016-12-11 Thread Amit Sela
So basically using new DoFn with *SplittableParDo*, *Window.Bound*, and
*GroupAlsoByWindow* requires a custom implementation by per runner as they
are not handled by DoFn anymore, right ?

On Sun, Dec 11, 2016 at 3:42 PM Eugene Kirpichov
 wrote:

> Hi Amit, I'll comment in more detail later, but meanwhile please take a
> look at https://github.com/apache/incubator-beam/pull/1565
> There is a small amount of relevant changes to spark runner.
> Take a look at implementation of SplittableParDo (already committed) in
> particular ProcessFn and it's usage in direct runner - this is exactly what
> you're looking for, a new DoFn that with per-runner support is able to emit
> multi-windowed values.
> On Sun, Dec 11, 2016 at 4:28 AM Amit Sela  wrote:
>
> > Hi all,
> >
> > I've been working on migrating the Spark runner to new DoFn and I've
> > stumbled upon a couple of cases where OldDoFn is used in a way that
> > accessed windowInternals (outputWindowedValue) such as AssignWindowsDoFn.
> >
> > Since changing windows is no longer the responsibility of DoFn I was
> > wondering who and how is this done.
> >
> > Thanks,
> > Amit
> >
>


Re: Scio Beam Scala API

2016-12-11 Thread Jean-Baptiste Onofré
Hi

I'm working on a feature branch with Neville and his guys. I already updated to 
last changes. I would like to propose a feature branch later this week.

Regards
JB⁣​

On Dec 11, 2016, 16:39, at 16:39, Aviem Zur  wrote:
>Hi,
>
>I've heard there has been work towards porting Scio Dataflow Scala API
>to
>Beam.
>I was wondering at what stage this is in, where is this happening (Saw
>no
>branch in BEAM repository, and one in Scio repository that is dependent
>on
>beam 0.2.0-INCUBATING) and if there is a way to contribute?


Scio Beam Scala API

2016-12-11 Thread Aviem Zur
Hi,

I've heard there has been work towards porting Scio Dataflow Scala API to
Beam.
I was wondering at what stage this is in, where is this happening (Saw no
branch in BEAM repository, and one in Scio repository that is dependent on
beam 0.2.0-INCUBATING) and if there is a way to contribute?


Re: New DoFn and WindowedValue/WinowingInternals

2016-12-11 Thread Eugene Kirpichov
Hi Amit, I'll comment in more detail later, but meanwhile please take a
look at https://github.com/apache/incubator-beam/pull/1565
There is a small amount of relevant changes to spark runner.
Take a look at implementation of SplittableParDo (already committed) in
particular ProcessFn and it's usage in direct runner - this is exactly what
you're looking for, a new DoFn that with per-runner support is able to emit
multi-windowed values.
On Sun, Dec 11, 2016 at 4:28 AM Amit Sela  wrote:

> Hi all,
>
> I've been working on migrating the Spark runner to new DoFn and I've
> stumbled upon a couple of cases where OldDoFn is used in a way that
> accessed windowInternals (outputWindowedValue) such as AssignWindowsDoFn.
>
> Since changing windows is no longer the responsibility of DoFn I was
> wondering who and how is this done.
>
> Thanks,
> Amit
>