Re: Apex runner status and next steps

2016-10-27 Thread Dan Halperin
I would add (explicitly, though this may be implicit or already supported)
that Apex should also be able to run the precommit
WordCountIT/WindowedWordCountIT
that execute on all runners.

https://github.com/apache/incubator-beam/blob/master/examples/java/pom.xml#L42
and
https://github.com/apache/incubator-beam/blob/master/examples/java/pom.xml#L132

Dan

On Wed, Oct 26, 2016 at 10:39 PM, Jean-Baptiste Onofré 
wrote:

> +1
>
> Good idea and fully agree about the three points.
>
> Regards
> JB
>
> ⁣​
>
> On Oct 26, 2016, 19:24, at 19:24, Thomas Weise  wrote:
> >Hi,
> >
> >The Apex runner is currently in a feature branch:
> >
> >https://github.com/apache/incubator-beam/tree/apex-runner
> >
> >Focus till here has been on functional completeness. It passes all the
> >integration tests.
> >
> >Apex with its stateful stream processing architecture can support all
> >of
> >the concepts in the Beam model (event time, triggers, watermarks etc.).
> >Most of these are already supported through the Beam SDK. The glue code
> >that had to be written isn't that much, which speaks to the conceptual
> >alignment in general.
> >
> >The runner in its current form does not leverage all the performance
> >and
> >scalability that Apex can deliver. We expect to address this with
> >future
> >contributions, leveraging things like incremental checkpointing,
> >partitioning and operator affinity from Apex.
> >
> >From a code perspective, the runner should be close to what is needed
> >for a
> >merge to master (based on the contribution guidelines). The following
> >items
> >have been identified as prerequisite:
> >
> >* Add a README.md to the runner directory that summarizes its current
> >state
> >* Update the https://beam.apache.org/learn/runners/capability-matrix/
> >to
> >include the Apex info
> >* Create the page under learn/runners (at least the place holder)
> >
> >It should also be noted that the integration tests currently take quite
> >long to run with embedded Apex (~50 minutes). Some of that has to do
> >with
> >how completion of the tests is determined and there are ideas to
> >improve it.
> >
> >I have created some JIRAs from my TODO list of follow-up work for more
> >contributors to get involved:
> >
> >https://issues.apache.org/jira/issues/?jql=project%20%3D%
> 20BEAM%20AND%20component%20%3D%20runner-apex
> >
> >Some folks on the Apex dev list have expressed interest to take up some
> >of
> >this work. And thanks to Ismaël Mejía for BEAM-815
> > !
> >
> >I'm looking forward to your comments and suggestions.
> >
> >Thanks,
> >Thomas
>


Re: Apex runner status and next steps

2016-10-26 Thread Jean-Baptiste Onofré
+1

Good idea and fully agree about the three points.

Regards
JB

⁣​

On Oct 26, 2016, 19:24, at 19:24, Thomas Weise  wrote:
>Hi,
>
>The Apex runner is currently in a feature branch:
>
>https://github.com/apache/incubator-beam/tree/apex-runner
>
>Focus till here has been on functional completeness. It passes all the
>integration tests.
>
>Apex with its stateful stream processing architecture can support all
>of
>the concepts in the Beam model (event time, triggers, watermarks etc.).
>Most of these are already supported through the Beam SDK. The glue code
>that had to be written isn't that much, which speaks to the conceptual
>alignment in general.
>
>The runner in its current form does not leverage all the performance
>and
>scalability that Apex can deliver. We expect to address this with
>future
>contributions, leveraging things like incremental checkpointing,
>partitioning and operator affinity from Apex.
>
>From a code perspective, the runner should be close to what is needed
>for a
>merge to master (based on the contribution guidelines). The following
>items
>have been identified as prerequisite:
>
>* Add a README.md to the runner directory that summarizes its current
>state
>* Update the https://beam.apache.org/learn/runners/capability-matrix/
>to
>include the Apex info
>* Create the page under learn/runners (at least the place holder)
>
>It should also be noted that the integration tests currently take quite
>long to run with embedded Apex (~50 minutes). Some of that has to do
>with
>how completion of the tests is determined and there are ideas to
>improve it.
>
>I have created some JIRAs from my TODO list of follow-up work for more
>contributors to get involved:
>
>https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20component%20%3D%20runner-apex
>
>Some folks on the Apex dev list have expressed interest to take up some
>of
>this work. And thanks to Ismaël Mejía for BEAM-815
> !
>
>I'm looking forward to your comments and suggestions.
>
>Thanks,
>Thomas


Re: Apex runner status and next steps

2016-10-26 Thread Davor Bonaci
+1.

I have nothing to add -- with those three bullets resolved, I think we can
move forward with the merge to master.

On Wed, Oct 26, 2016 at 10:24 AM, Thomas Weise  wrote:

> Hi,
>
> The Apex runner is currently in a feature branch:
>
> https://github.com/apache/incubator-beam/tree/apex-runner
>
> Focus till here has been on functional completeness. It passes all the
> integration tests.
>
> Apex with its stateful stream processing architecture can support all of
> the concepts in the Beam model (event time, triggers, watermarks etc.).
> Most of these are already supported through the Beam SDK. The glue code
> that had to be written isn't that much, which speaks to the conceptual
> alignment in general.
>
> The runner in its current form does not leverage all the performance and
> scalability that Apex can deliver. We expect to address this with future
> contributions, leveraging things like incremental checkpointing,
> partitioning and operator affinity from Apex.
>
> From a code perspective, the runner should be close to what is needed for a
> merge to master (based on the contribution guidelines). The following items
> have been identified as prerequisite:
>
> * Add a README.md to the runner directory that summarizes its current state
> * Update the https://beam.apache.org/learn/runners/capability-matrix/ to
> include the Apex info
> * Create the page under learn/runners (at least the place holder)
>
> It should also be noted that the integration tests currently take quite
> long to run with embedded Apex (~50 minutes). Some of that has to do with
> how completion of the tests is determined and there are ideas to improve
> it.
>
> I have created some JIRAs from my TODO list of follow-up work for more
> contributors to get involved:
>
> https://issues.apache.org/jira/issues/?jql=project%20%
> 3D%20BEAM%20AND%20component%20%3D%20runner-apex
>
> Some folks on the Apex dev list have expressed interest to take up some of
> this work. And thanks to Ismaël Mejía for BEAM-815
>  !
>
> I'm looking forward to your comments and suggestions.
>
> Thanks,
> Thomas
>


Apex runner status and next steps

2016-10-26 Thread Thomas Weise
Hi,

The Apex runner is currently in a feature branch:

https://github.com/apache/incubator-beam/tree/apex-runner

Focus till here has been on functional completeness. It passes all the
integration tests.

Apex with its stateful stream processing architecture can support all of
the concepts in the Beam model (event time, triggers, watermarks etc.).
Most of these are already supported through the Beam SDK. The glue code
that had to be written isn't that much, which speaks to the conceptual
alignment in general.

The runner in its current form does not leverage all the performance and
scalability that Apex can deliver. We expect to address this with future
contributions, leveraging things like incremental checkpointing,
partitioning and operator affinity from Apex.

>From a code perspective, the runner should be close to what is needed for a
merge to master (based on the contribution guidelines). The following items
have been identified as prerequisite:

* Add a README.md to the runner directory that summarizes its current state
* Update the https://beam.apache.org/learn/runners/capability-matrix/ to
include the Apex info
* Create the page under learn/runners (at least the place holder)

It should also be noted that the integration tests currently take quite
long to run with embedded Apex (~50 minutes). Some of that has to do with
how completion of the tests is determined and there are ideas to improve it.

I have created some JIRAs from my TODO list of follow-up work for more
contributors to get involved:

https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20component%20%3D%20runner-apex

Some folks on the Apex dev list have expressed interest to take up some of
this work. And thanks to Ismaël Mejía for BEAM-815
 !

I'm looking forward to your comments and suggestions.

Thanks,
Thomas