Re: [VOTE] Release 2.11.0, release candidate #1

2019-02-25 Thread Maximilian Michels

+1 (binding)

On 25.02.19 03:44, Konstantinos Katsiapis wrote:

+1

We (TFX ) are really looking forward to 
the Python 3 compatibility that Apache Beam 2.11 brings. The 2.11 
release will allow several of our existing Apache Beam based libraries 
like TensorFlow Data Validation 
, TensorFlow 
Transform  and TensorFlow 
Model Analysis  to 
be Python 3 Compatible (since they are already Python 3 "Ready" and as 
such blocked on this release).


Thanks,
Gus

On Fri, Feb 22, 2019 at 7:08 PM Reuven Lax > wrote:


+1 (binding)

On Fri, Feb 22, 2019 at 5:09 PM Robert Bradshaw mailto:rober...@google.com>> wrote:

+1 (binding)

I verified the artifacts for correctness, as well as one of the
wheels
on simple pipelines (Python 3).


On Sat, Feb 23, 2019 at 1:01 AM Kenneth Knowles mailto:k...@apache.org>> wrote:
 >
 > +1 (binding)
 >
 > Kenn
 >
 > On Fri, Feb 22, 2019 at 3:51 PM Ahmet Altay mailto:al...@google.com>> wrote:
 >>
 >>
 >>
 >> On Fri, Feb 22, 2019 at 3:46 PM Kenneth Knowles
mailto:k...@apache.org>> wrote:
 >>>
 >>> I believe you need to sign & hash the Python wheels. The
instructions is unfortunately a bit hidden in the release guide
without an entry in the table of contents:
 >>
 >>
 >> Done, thank you for the pointer.
 >>
 >>>
 >>>
 >>> "Once all python wheels have been staged dist.apache.org
, please run
./sign_hash_python_wheels.sh to sign and hash python wheels."
 >>>
 >>> On Fri, Feb 22, 2019 at 8:40 AM Ahmet Altay
mailto:al...@google.com>> wrote:
 
 
 
  On Fri, Feb 22, 2019 at 1:32 AM Robert Bradshaw
mailto:rober...@google.com>> wrote:
 >
 > It looks like
https://github.com/apache/beam/blob/release-2.11.0/build.gradle
 > differs from the copy in the release source tarball (line
22, and some
 > whitespace below). Other than that, the artifacts and
signatures look
 > good.
 
 
  Thank you. I fixed the issue (please take a look again).
The difference was due to
https://issues.apache.org/jira/browse/BEAM-6726.
 
 >
 >
 > On Fri, Feb 22, 2019 at 9:50 AM Ahmet Altay
mailto:al...@google.com>> wrote:
 > >
 > > Hi everyone,
 > >
 > > Please review and vote on the release candidate #1 for
the version 2.11.0, as follows:
 > >
 > > [ ] +1, Approve the release
 > > [ ] -1, Do not approve the release (please provide
specific comments)
 > >
 > > The complete staging area is available for your review,
which includes:
 > > * JIRA release notes [1],
 > > * the official Apache source release to be deployed to
dist.apache.org  [2], which is signed
with the key with fingerprint
64B84A5AD91F9C20F5E9D9A7D62E71416096FA00 [3],
 > > * all artifacts to be deployed to the Maven Central
Repository [4],
 > > * source code tag "v2.11.0-RC1" [5],
 > > * website pull request listing the release [6] and
publishing the API reference manual [7].
 > > * Python artifacts are deployed along with the source
release to the dist.apache.org  [2].
 > > * Validation sheet with a tab for 2.11.0 release to
help with validation [8].
 > >
 > > The vote will be open for at least 72 hours. It is
adopted by majority approval, with at least 3 PMC affirmative votes.
 > >
 > > Thanks,
 > > Ahmet
 > >
 > > [1]

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12344775
 > > [2] https://dist.apache.org/repos/dist/dev/beam/2.11.0/
 > > [3] https://dist.apache.org/repos/dist/release/beam/KEYS
 > > [4]
https://repository.apache.org/content/repositories/orgapachebeam-1061/
 > > [5] https://github.com/apache/beam/tree/v2.11.0-RC1
 > > [6] https://github.com/apache/beam/pull/7924
 > > [7] https://github.com/apache/beam-site/pull/587
 > > [8]

https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=542393513
 > >



--
Gus Katsiapis | Softw

Re: Issue building master

2019-02-25 Thread Michael Luckey
Hi JB,

great your are back.

Last time I encountered those issues where

1. python virtualEnv failling on missing python3.5. Note: Build currently
asks explicitly for python3.5, so u have to make sure, that's on your path.
But of course you might have hit another issue, difficult to diagnose
without further insights.

2. While testing with  "-PisRelase" enabled I also hit that issue. The
release path seems to install that locally which causes rat failing on next
build. Not sure, whether u had that flag set, though. I 'resolved' that
issue for me manually deleting that created folder before issuing a new
Gradle build. I am unsure, whether they should be produced with license
included or if the need to be excluded from rat checks, as Kenn suggested.

hth,
michel


On Mon, Feb 25, 2019 at 7:05 AM Jean-Baptiste Onofré 
wrote:

> Hi Kenn,
>
> I started with a git clean -fdx, so it's maybe the build which created
> those files.
>
> Regards
> JB
>
> On 25/02/2019 00:16, Kenneth Knowles wrote:
> > To #2 it looks like it has done an install relative to the Python SDK
> > directory. The files are not set up to be ignored by git/RAT.
> >
> > Kenn
> >
> > On Sun, Feb 24, 2019 at 2:02 PM Reuven Lax  > > wrote:
> >
> >
> > Hi JB. Good to hear from you!
> >
> > What version of Python do you have installed?
> >
> > Reuven
> >
> > On Sun, Feb 24, 2019 at 11:28 AM Jean-Baptiste Onofré
> > mailto:j...@nanthrax.net>> wrote:
> >
> > Hi guys,
> >
> > As said couple of weeks ago, I'm happy to be back on the
> > project. I'm
> > resuming several PRs and improvements, including work on Spark
> > runner
> > with "my" guys (Etienne and Alexey especially).
> >
> > I have couple of issues while building master, both on the
> > Python SDK:
> >
> > 1. The first issue is that virtualenv doesn't work on my box. It
> > fails with:
> >
> > Process 'command 'virtualenv'' finished with non-zero exit value
> 3
> >
> > 2. It's maybe a consequence of 1, but I also see RAT failures on
> > Python SDK:
> >
> > Unapproved/unknown license:
> > sdks/python/apache-beam-2.12.0.dev0/PKG-INFO
> > Unapproved/unknown license:
> > sdks/python/apache-beam-2.12.0.dev0/setup.cfg
> >
> > Do you have the same issue ? In the mean time, I'm checking on
> > Jenkins
> > and the environment.
> >
> > Regards
> > JB
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org 
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Beam Dependency Check Report (2019-02-25)

2019-02-25 Thread Apache Jenkins Server

High Priority Dependency Updates Of Beam Python SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
future
0.16.0
0.17.1
2016-10-27
2018-12-10BEAM-5968
oauth2client
3.0.0
4.1.3
2018-12-10
2018-12-10BEAM-6089
High Priority Dependency Updates Of Beam Java SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
com.rabbitmq:amqp-client
4.6.0
5.6.0
2018-03-26
2019-01-25BEAM-5895
org.apache.rat:apache-rat-tasks
0.12
0.13
2016-06-07
2018-10-13BEAM-6039
com.google.auto.service:auto-service
1.0-rc2
1.0-rc4
2014-10-25
2017-12-11BEAM-5541
com.amazonaws:aws-java-sdk-kinesis
1.11.255
1.11.505
2017-12-23
2019-02-22BEAM-6330
com.github.ben-manes.versions:com.github.ben-manes.versions.gradle.plugin
0.17.0
0.20.0
2019-02-11
2019-02-11BEAM-6645
org.conscrypt:conscrypt-openjdk
1.1.3
2.0.0
2018-06-04
2019-02-13BEAM-5748
org.elasticsearch:elasticsearch
6.4.0
7.0.0-beta1
2018-08-18
2019-02-13BEAM-6090
org.elasticsearch:elasticsearch-hadoop
5.0.0
7.0.0-beta1
2016-10-26
2019-02-13BEAM-5551
org.elasticsearch.client:elasticsearch-rest-client
6.4.0
7.0.0-beta1
2018-08-18
2019-02-13BEAM-6091
com.google.errorprone:error_prone_annotations
2.1.2
2.3.3
2017-10-19
2019-02-22BEAM-6741
org.elasticsearch.test:framework
6.4.0
7.0.0-beta1
2018-08-18
2019-02-13BEAM-6092
io.grpc:grpc-context
1.13.1
1.18.0
2018-06-21
2019-01-15BEAM-5897
io.grpc:grpc-protobuf
1.13.1
1.18.0
2018-06-21
2019-01-15BEAM-5900
io.grpc:grpc-testing
1.13.1
1.18.0
2018-06-21
2019-01-15BEAM-5902
com.google.code.gson:gson
2.7
2.8.5
2016-06-14
2018-05-22BEAM-5558
org.apache.hbase:hbase-common
1.2.6
2.1.3
2017-05-29
2019-02-11BEAM-5560
org.apache.hbase:hbase-hadoop-compat
1.2.6
2.1.3
2017-05-29
2019-02-11BEAM-5561
org.apache.hbase:hbase-hadoop2-compat
1.2.6
2.1.3
2017-05-29
2019-02-11BEAM-5562
org.apache.hbase:hbase-server
1.2.6
2.1.3
2017-05-29
2019-02-11BEAM-5563
org.apache.hbase:hbase-shaded-client
1.2.6
2.1.3
2017-05-29
2019-02-11BEAM-5564
org.apache.hive:hive-cli
2.1.0
3.1.1
2016-06-17
2018-10-24BEAM-5566
org.apache.hive:hive-common
2.1.0
3.1.1
2016-06-17
2018-10-24BEAM-5567
org.apache.hive:hive-exec
2.1.0
3.1.1
2016-06-17
2018-10-24BEAM-5568
org.apache.hive.hcatalog:hive-hcatalog-core
2.1.0
3.1.1
2016-06-17
2018-10-24BEAM-5569
net.java.dev.javacc:javacc
4.0
7.0.4
2006-03-17
2018-09-17BEAM-5570
javax.servlet:javax.servlet-api
3.1.0
4.0.1
2013-04-25
2018-04-20BEAM-5750
redis.clients:jedis
2.9.0
3.0.1
2016-07-22
2018-12-27BEAM-6125
org.eclipse.jetty:jetty-server
9.2.10.v20150310
9.4.15.v20190215
2015-03-10
2019-02-15BEAM-5752
org.eclipse.jetty:jetty-servlet
9.2.10.v20150310
9.4.15.v20190215
2015-03-10
2019-02-15BEAM-5753
net.java.dev.jna:jna
4.1.0
5.2.0
2014-03-06
2018-12-23BEAM-5573
junit:junit
4.13-beta-1
4.13-beta-2
2018-11-25
2019-02-02BEAM-6127
com.esotericsoftware:kryo
4.0.2
5.0.0-RC2
2018-03-20
2019-02-05BEAM-5809
com.esotericsoftware.kryo:kryo
2.21
2.24.0
2013-02-27
2014-05-04BEAM-5574
org.apache.kudu:kudu-client
1.4.0
1.8.0
2017-06-05
2018-10-16BEAM-5575
io.dropwizard.metrics:metrics-core
3.1.2
4.1.0-rc3
2015-04-26
2018-12-30BEAM-5576
io.grpc:protoc-gen-grpc-java
1.13.1
1.18.0
2018-06-21
2019-01-15BEAM-5903
org.apache.qpid:proton-j
0.13.1
0.31.0
2016-07-02
2018-11-23BEAM-5582
   

Re: Issue building master

2019-02-25 Thread Jean-Baptiste Onofré
Hi Michael,

Thanks for the update, I'm updating my box right now.

Regards
JB

On 25/02/2019 13:03, Michael Luckey wrote:
> Hi JB,
> 
> great your are back.
> 
> Last time I encountered those issues where
> 
> 1. python virtualEnv failling on missing python3.5. Note: Build
> currently asks explicitly for python3.5, so u have to make sure, that's
> on your path. But of course you might have hit another issue, difficult
> to diagnose without further insights.
> 
> 2. While testing with  "-PisRelase" enabled I also hit that issue. The
> release path seems to install that locally which causes rat failing on
> next build. Not sure, whether u had that flag set, though. I 'resolved'
> that issue for me manually deleting that created folder before issuing a
> new Gradle build. I am unsure, whether they should be produced with
> license included or if the need to be excluded from rat checks, as Kenn
> suggested.
> 
> hth,
> michel
> 
> 
> On Mon, Feb 25, 2019 at 7:05 AM Jean-Baptiste Onofré  > wrote:
> 
> Hi Kenn,
> 
> I started with a git clean -fdx, so it's maybe the build which created
> those files.
> 
> Regards
> JB
> 
> On 25/02/2019 00:16, Kenneth Knowles wrote:
> > To #2 it looks like it has done an install relative to the Python SDK
> > directory. The files are not set up to be ignored by git/RAT.
> >
> > Kenn
> >
> > On Sun, Feb 24, 2019 at 2:02 PM Reuven Lax  
> > >> wrote:
> >
> >
> >     Hi JB. Good to hear from you!
> >
> >     What version of Python do you have installed?
> >
> >     Reuven
> >
> >     On Sun, Feb 24, 2019 at 11:28 AM Jean-Baptiste Onofré
> >     mailto:j...@nanthrax.net>
> >> wrote:
> >
> >         Hi guys,
> >
> >         As said couple of weeks ago, I'm happy to be back on the
> >         project. I'm
> >         resuming several PRs and improvements, including work on Spark
> >         runner
> >         with "my" guys (Etienne and Alexey especially).
> >
> >         I have couple of issues while building master, both on the
> >         Python SDK:
> >
> >         1. The first issue is that virtualenv doesn't work on my
> box. It
> >         fails with:
> >
> >         Process 'command 'virtualenv'' finished with non-zero exit
> value 3
> >
> >         2. It's maybe a consequence of 1, but I also see RAT
> failures on
> >         Python SDK:
> >
> >         Unapproved/unknown license:
> >         sdks/python/apache-beam-2.12.0.dev0/PKG-INFO
> >         Unapproved/unknown license:
> >         sdks/python/apache-beam-2.12.0.dev0/setup.cfg
> >
> >         Do you have the same issue ? In the mean time, I'm checking on
> >         Jenkins
> >         and the environment.
> >
> >         Regards
> >         JB
> >         --
> >         Jean-Baptiste Onofré
> >         jbono...@apache.org 
> >
> >         http://blog.nanthrax.net
> >         Talend - http://www.talend.com
> >
> 
> -- 
> Jean-Baptiste Onofré
> jbono...@apache.org 
> http://blog.nanthrax.net
> Talend - http://www.talend.com
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [VOTE] Release 2.11.0, release candidate #1

2019-02-25 Thread Łukasz Gajowy
Hi,

https://issues.apache.org/jira/browse/BEAM-6697 Is this issue a release
blocker? I'm asking because performance tests are not part of the release
verification checklist. (Should they be?)

The issue seems to be related to `google_cloud_bigdataoss_version` change
(1.9.0 -> 1.9.12)

Łukasz

pon., 25 lut 2019 o 11:08 Maximilian Michels  napisał(a):

> +1 (binding)
>
> On 25.02.19 03:44, Konstantinos Katsiapis wrote:
> > +1
> >
> > We (TFX ) are really looking forward to
> > the Python 3 compatibility that Apache Beam 2.11 brings. The 2.11
> > release will allow several of our existing Apache Beam based libraries
> > like TensorFlow Data Validation
> > , TensorFlow
> > Transform  and
> TensorFlow
> > Model Analysis  to
> > be Python 3 Compatible (since they are already Python 3 "Ready" and as
> > such blocked on this release).
> >
> > Thanks,
> > Gus
> >
> > On Fri, Feb 22, 2019 at 7:08 PM Reuven Lax  > > wrote:
> >
> > +1 (binding)
> >
> > On Fri, Feb 22, 2019 at 5:09 PM Robert Bradshaw  > > wrote:
> >
> > +1 (binding)
> >
> > I verified the artifacts for correctness, as well as one of the
> > wheels
> > on simple pipelines (Python 3).
> >
> >
> > On Sat, Feb 23, 2019 at 1:01 AM Kenneth Knowles  > > wrote:
> >  >
> >  > +1 (binding)
> >  >
> >  > Kenn
> >  >
> >  > On Fri, Feb 22, 2019 at 3:51 PM Ahmet Altay  > > wrote:
> >  >>
> >  >>
> >  >>
> >  >> On Fri, Feb 22, 2019 at 3:46 PM Kenneth Knowles
> > mailto:k...@apache.org>> wrote:
> >  >>>
> >  >>> I believe you need to sign & hash the Python wheels. The
> > instructions is unfortunately a bit hidden in the release guide
> > without an entry in the table of contents:
> >  >>
> >  >>
> >  >> Done, thank you for the pointer.
> >  >>
> >  >>>
> >  >>>
> >  >>> "Once all python wheels have been staged dist.apache.org
> > , please run
> > ./sign_hash_python_wheels.sh to sign and hash python wheels."
> >  >>>
> >  >>> On Fri, Feb 22, 2019 at 8:40 AM Ahmet Altay
> > mailto:al...@google.com>> wrote:
> >  
> >  
> >  
> >   On Fri, Feb 22, 2019 at 1:32 AM Robert Bradshaw
> > mailto:rober...@google.com>> wrote:
> >  >
> >  > It looks like
> > https://github.com/apache/beam/blob/release-2.11.0/build.gradle
> >  > differs from the copy in the release source tarball (line
> > 22, and some
> >  > whitespace below). Other than that, the artifacts and
> > signatures look
> >  > good.
> >  
> >  
> >   Thank you. I fixed the issue (please take a look again).
> > The difference was due to
> > https://issues.apache.org/jira/browse/BEAM-6726.
> >  
> >  >
> >  >
> >  > On Fri, Feb 22, 2019 at 9:50 AM Ahmet Altay
> > mailto:al...@google.com>> wrote:
> >  > >
> >  > > Hi everyone,
> >  > >
> >  > > Please review and vote on the release candidate #1 for
> > the version 2.11.0, as follows:
> >  > >
> >  > > [ ] +1, Approve the release
> >  > > [ ] -1, Do not approve the release (please provide
> > specific comments)
> >  > >
> >  > > The complete staging area is available for your review,
> > which includes:
> >  > > * JIRA release notes [1],
> >  > > * the official Apache source release to be deployed to
> > dist.apache.org  [2], which is signed
> > with the key with fingerprint
> > 64B84A5AD91F9C20F5E9D9A7D62E71416096FA00 [3],
> >  > > * all artifacts to be deployed to the Maven Central
> > Repository [4],
> >  > > * source code tag "v2.11.0-RC1" [5],
> >  > > * website pull request listing the release [6] and
> > publishing the API reference manual [7].
> >  > > * Python artifacts are deployed along with the source
> > release to the dist.apache.org  [2].
> >  > > * Validation sheet with a tab for 2.11.0 release to
> > help with validation [8].
> >  > >
> >  > > The vote will be open for at least 72 hours. It is
> > adopted by majority approval, with at least 3 PMC affirmative
> votes.
> >  >>

Re: Beam Jenkins job summary available in .test-infra/jenkins/REAMDE.md

2019-02-25 Thread Mark Liu
Glad to hear that!

Thanks,
Mark

On Sat, Feb 16, 2019 at 1:53 PM Maximilian Michels  wrote:

> Hi Mark,
>
> That's super useful. I often end up using Jenkins' search which isn't
> all that great and finds jobs from all Apache projects. Thank you!
>
> Cheers,
> Max
>
> On 15.02.19 22:34, Mark Liu wrote:
> > TL;DR: Check out .test-infra/jenkins/REAMDE.md
> > <
> https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md> for
>
> > Beam Jenkins job summary!
> >
> > Hi folks,
> >
> > I found it's difficult for me to quickly find particular Jenkins job
> > link or PR trigger phrase during development and PR review. So I
> > collected some useful job information from groovy files and put them in
> > .test-infra/jenkins/REAMDE.md
> > <
> https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md>.
>
> > And also linked this file from PR template
> > <
> https://github.com/apache/beam/blob/master/.github/PULL_REQUEST_TEMPLATE.md>.
>
> > Due to large number of jobs we currently running, I group them into few
> > tables: PreCommit, PopstCommit, Performance, Inventory and Others.
> > Hopefully this's clear and also helpful to other contributors.
> >
> > Since the README is generated based on current state of Jenkins groovy
> > files, so unfortunately any further changes won't be reflected there
> > without manual update.
> >
> > Thanks,
> > Mark
> >
>


What quick command to catch common issues before pushing a python PR?

2019-02-25 Thread Alex Amato
I made a thread about this a while back for java, but I don't think the
same commands like sptoless work for python.

auto fixing lint issues
running and quick checks which would fail the PR (without running the whole
precommit?)
Something like findbugs to detect common issues (i.e. py3 compliance)

FWIW, this is what I have been using for java. It will catch pretty much
everything except presubmit test failures.

./gradlew spotlessApply && ./gradlew checkstyleMain && ./gradlew
checkstyleTest && ./gradlew javadoc && ./gradlew findbugsMain && ./gradlew
compileTestJava && ./gradlew compileJava


Re: Kettle Beam 0.5.0

2019-02-25 Thread Matt Casters
Hi Kenn,

It's fundamentally a question I asked myself a few times when I see
questions on this very mailing list.  Automatic column detection, weird
data sources... all these things have already been solved in Kettle long
time ago.

The core Kettle API for a transformation step (as it is called) follows
similar logic to Apache Beam Transform in the sense that a step reads rows
of data and writes them. Things like side-loading are also supported but
also a bunch of other options like directing rows to specific other target
steps (switch/case) or reading from specific source steps (Merge join
specifying left/right).
These similarities have made it "fairly easy" to wrap them up in
Transform/DoFn and ultimately convert Kettle transformations into Beam
pipelines.

I think we can make it easier in the future by making some changes to the
core API of Kettle itself. The API has been working fine for over 15 years
but and it's doable now but I think there are things we learned along the
way and there are more options right now.
Before we do something like that however we (the core Kettle community) are
contemplating making Kettle itself an Apache incubator project. Kettle is
pretty widely used in large organisations across the globe and the Apache
cooperation model is something we think would work better than what is
currently in place for all sorts of reasons I won't go into as I'm trying
to phrase this as diplomatically as possible.  If anyone has suggestions on
this subject, please reach out to me.

But to the core of your question: I do see a lot of value in a reverse wrap
of a generic IO wrapper around a bunch of Kettle input and output step
plugins. Instead of converting Kettle metadata into the Beam API you would
convert Beam properties to Kettle metadata in some smart way, probably
simply by sub-classing some Kettle metadata beans to implement Input or
Output interfaces.
What would be an issue is that any data integration running off of metadata
(any ETL tool really) requires input and output formats to be predictable.
This means that there needs to be a certain contract as to what goes in and
out of steps in any shape or form.  Because of this, the current pipelines
we build pass around data in the form of a KettleRow
(PCollection). KettleRow is just an Object[] wrapper and you get
a description of what's in there.  If folks can live with that they can
easily convert this data to other formats.

All the best,

Matt

Op ma 25 feb. 2019 00:25 schreef Kenneth Knowles :

> Nice work! I'm impressed at how quickly this has come together.
>
> Did you build a generic adapter for using Kettle connectors in Beam? (I
> don't know what a Kettle connector API looks like)
>
> It would be cool to make these connectors more broadly available to Beam
> users, though maybe not optimal for parallel big data reads.
>
> Kenn
>
> On Sun, Feb 24, 2019 at 1:13 PM Matt Casters 
> wrote:
>
>>
>> Folks, it's not my habit but playing around with running Kettle
>> transformations on Flink w/ Beam was so cool I had to blog about it.
>>
>>
>> http://sandbox.kettle.be/wordpress/index.php/2019/02/24/kettle-beam-update-0-5-0/
>>
>> Allow me to again extend my thanks to all the developers involved.  Some
>> really cool things are happening right now.
>> Version 0.5.0 of Kettle Beam now supports all Kettle steps including
>> third party connectors like SalesForce, SAP, Neo4j and so on.  Obviously
>> they don't always make sense in a big data context but side-loading the
>> data for in-memory lookup and so on can indeed make a lot of sense in a lot
>> of scenarios.
>> For the batched output I also managed to get performance on-par with
>> expectations, specifically for Neo4j since I work for the company after
>> all.  I really appreciate all the help I got so far getting to this point.
>> In a record time we've gone from conceptual work to something we can
>> consider to be stable. Apache Beam has really made a huge difference.
>>
>> Cheers,
>>
>> Matt
>> ---
>> Matt Casters attcast...@gmail.com>
>> Senior Solution Architect, Kettle Project Founder
>>
>>
>>


Re: What quick command to catch common issues before pushing a python PR?

2019-02-25 Thread Michael Luckey
Hi,

just curious to know, why you prefer those && instead of a plain

./gradlew spotlessApply checkstyleMain checkstyleTest javadoc findbugsMain
compileJava compileTestJava

michel

On Mon, Feb 25, 2019 at 7:13 PM Alex Amato  wrote:

> I made a thread about this a while back for java, but I don't think the
> same commands like sptoless work for python.
>
> auto fixing lint issues
> running and quick checks which would fail the PR (without running the
> whole precommit?)
> Something like findbugs to detect common issues (i.e. py3 compliance)
>
> FWIW, this is what I have been using for java. It will catch pretty much
> everything except presubmit test failures.
>
> ./gradlew spotlessApply && ./gradlew checkstyleMain && ./gradlew
> checkstyleTest && ./gradlew javadoc && ./gradlew findbugsMain && ./gradlew
> compileTestJava && ./gradlew compileJava
>


Re: What quick command to catch common issues before pushing a python PR?

2019-02-25 Thread Kenneth Knowles
FWIW gradle is a depgraph-based build system. You can gain a few seconds by
putting all but spotlessApply in one command.

./gradlew spotlessApply && ./gradlew checkstyleMain checkstyleTest javadoc
findbugsMain compileTestJava compileJava

It might be clever to define a meta-task. Gradle "base plugin" has the
notable check (build and run tests), assemble (make artifacts), and build
(assemble + check, badly named!)

I think something like "everything except running tests and building
artifacts" might be helpful.

Kenn

On Mon, Feb 25, 2019 at 10:13 AM Alex Amato  wrote:

> I made a thread about this a while back for java, but I don't think the
> same commands like sptoless work for python.
>
> auto fixing lint issues
> running and quick checks which would fail the PR (without running the
> whole precommit?)
> Something like findbugs to detect common issues (i.e. py3 compliance)
>
> FWIW, this is what I have been using for java. It will catch pretty much
> everything except presubmit test failures.
>
> ./gradlew spotlessApply && ./gradlew checkstyleMain && ./gradlew
> checkstyleTest && ./gradlew javadoc && ./gradlew findbugsMain && ./gradlew
> compileTestJava && ./gradlew compileJava
>


Re: What quick command to catch common issues before pushing a python PR?

2019-02-25 Thread Udi Meiri
Talking about Python:
I only know of "./gradlew lint", which include style and some py3
compliance checking.
There is no auto-fix like spotlessApply AFAIK.

As a side-note, I really dislike our python line continuation indent rule,
since pycharm can't be configured to adhere to it and I find myself
manually adjusting whitespace all the time.


On Mon, Feb 25, 2019 at 10:22 AM Kenneth Knowles  wrote:

> FWIW gradle is a depgraph-based build system. You can gain a few seconds
> by putting all but spotlessApply in one command.
>
> ./gradlew spotlessApply && ./gradlew checkstyleMain checkstyleTest javadoc
> findbugsMain compileTestJava compileJava
>
> It might be clever to define a meta-task. Gradle "base plugin" has the
> notable check (build and run tests), assemble (make artifacts), and build
> (assemble + check, badly named!)
>
> I think something like "everything except running tests and building
> artifacts" might be helpful.
>
> Kenn
>
> On Mon, Feb 25, 2019 at 10:13 AM Alex Amato  wrote:
>
>> I made a thread about this a while back for java, but I don't think the
>> same commands like sptoless work for python.
>>
>> auto fixing lint issues
>> running and quick checks which would fail the PR (without running the
>> whole precommit?)
>> Something like findbugs to detect common issues (i.e. py3 compliance)
>>
>> FWIW, this is what I have been using for java. It will catch pretty much
>> everything except presubmit test failures.
>>
>> ./gradlew spotlessApply && ./gradlew checkstyleMain && ./gradlew
>> checkstyleTest && ./gradlew javadoc && ./gradlew findbugsMain && ./gradlew
>> compileTestJava && ./gradlew compileJava
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: What quick command to catch common issues before pushing a python PR?

2019-02-25 Thread Alex Amato
@Michael, no particular reason. I think Ken's suggestion makes more sense.

On Mon, Feb 25, 2019 at 10:36 AM Udi Meiri  wrote:

> Talking about Python:
> I only know of "./gradlew lint", which include style and some py3
> compliance checking.
> There is no auto-fix like spotlessApply AFAIK.
>
> As a side-note, I really dislike our python line continuation indent rule,
> since pycharm can't be configured to adhere to it and I find myself
> manually adjusting whitespace all the time.
>
>
> On Mon, Feb 25, 2019 at 10:22 AM Kenneth Knowles  wrote:
>
>> FWIW gradle is a depgraph-based build system. You can gain a few seconds
>> by putting all but spotlessApply in one command.
>>
>> ./gradlew spotlessApply && ./gradlew checkstyleMain checkstyleTest
>> javadoc findbugsMain compileTestJava compileJava
>>
>> It might be clever to define a meta-task. Gradle "base plugin" has the
>> notable check (build and run tests), assemble (make artifacts), and build
>> (assemble + check, badly named!)
>>
>> I think something like "everything except running tests and building
>> artifacts" might be helpful.
>>
>> Kenn
>>
>> On Mon, Feb 25, 2019 at 10:13 AM Alex Amato  wrote:
>>
>>> I made a thread about this a while back for java, but I don't think the
>>> same commands like sptoless work for python.
>>>
>>> auto fixing lint issues
>>> running and quick checks which would fail the PR (without running the
>>> whole precommit?)
>>> Something like findbugs to detect common issues (i.e. py3 compliance)
>>>
>>> FWIW, this is what I have been using for java. It will catch pretty much
>>> everything except presubmit test failures.
>>>
>>> ./gradlew spotlessApply && ./gradlew checkstyleMain && ./gradlew
>>> checkstyleTest && ./gradlew javadoc && ./gradlew findbugsMain && ./gradlew
>>> compileTestJava && ./gradlew compileJava
>>>
>>


Re: What quick command to catch common issues before pushing a python PR?

2019-02-25 Thread Ruoyun Huang
Out of curiosity as a light gradle user, I did a side by side comparison,
and the readings confirm what Ken and Michael suggests.

In the same repository, do gradle clean then followed by either of the two
commands. Measure their runtime respectively.  The latter one takes *1/3*
running time.

time ./gradlew spotlessApply && ./gradlew checkstyleMain && ./gradlew
checkstyleTest && ./gradlew javadoc && ./gradlew findbugsMain && ./gradlew
compileTestJava && ./gradlew compileJava
real 9m29.330s user 0m11.330s sys 0m1.239s

time ./gradlew spotlessApply checkstyleMain checkstyleTest javadoc
findbugsMain compileJava compileTestJava
real3m35.573s
user0m2.701s
sys 0m0.327s







On Mon, Feb 25, 2019 at 10:47 AM Alex Amato  wrote:

> @Michael, no particular reason. I think Ken's suggestion makes more sense.
>
> On Mon, Feb 25, 2019 at 10:36 AM Udi Meiri  wrote:
>
>> Talking about Python:
>> I only know of "./gradlew lint", which include style and some py3
>> compliance checking.
>> There is no auto-fix like spotlessApply AFAIK.
>>
>> As a side-note, I really dislike our python line continuation indent
>> rule, since pycharm can't be configured to adhere to it and I find myself
>> manually adjusting whitespace all the time.
>>
>>
>> On Mon, Feb 25, 2019 at 10:22 AM Kenneth Knowles  wrote:
>>
>>> FWIW gradle is a depgraph-based build system. You can gain a few seconds
>>> by putting all but spotlessApply in one command.
>>>
>>> ./gradlew spotlessApply && ./gradlew checkstyleMain checkstyleTest
>>> javadoc findbugsMain compileTestJava compileJava
>>>
>>> It might be clever to define a meta-task. Gradle "base plugin" has the
>>> notable check (build and run tests), assemble (make artifacts), and build
>>> (assemble + check, badly named!)
>>>
>>> I think something like "everything except running tests and building
>>> artifacts" might be helpful.
>>>
>>> Kenn
>>>
>>> On Mon, Feb 25, 2019 at 10:13 AM Alex Amato  wrote:
>>>
 I made a thread about this a while back for java, but I don't think the
 same commands like sptoless work for python.

 auto fixing lint issues
 running and quick checks which would fail the PR (without running the
 whole precommit?)
 Something like findbugs to detect common issues (i.e. py3 compliance)

 FWIW, this is what I have been using for java. It will catch pretty
 much everything except presubmit test failures.

 ./gradlew spotlessApply && ./gradlew checkstyleMain && ./gradlew
 checkstyleTest && ./gradlew javadoc && ./gradlew findbugsMain && ./gradlew
 compileTestJava && ./gradlew compileJava

>>>

-- 

Ruoyun  Huang


Re: What quick command to catch common issues before pushing a python PR?

2019-02-25 Thread Kenneth Knowles
Ah, that is likely caused by us having ill-defined tasks that cannot be
cached. Or is it that the configuration time is so significant?

Kenn

On Mon, Feb 25, 2019 at 11:05 AM Ruoyun Huang  wrote:

> Out of curiosity as a light gradle user, I did a side by side comparison,
> and the readings confirm what Ken and Michael suggests.
>
> In the same repository, do gradle clean then followed by either of the two
> commands. Measure their runtime respectively.  The latter one takes *1/3*
> running time.
>
> time ./gradlew spotlessApply && ./gradlew checkstyleMain && ./gradlew
> checkstyleTest && ./gradlew javadoc && ./gradlew findbugsMain && ./gradlew
> compileTestJava && ./gradlew compileJava
> real 9m29.330s user 0m11.330s sys 0m1.239s
>
> time ./gradlew spotlessApply checkstyleMain checkstyleTest javadoc
> findbugsMain compileJava compileTestJava
> real3m35.573s
> user0m2.701s
> sys 0m0.327s
>
>
>
>
>
>
>
> On Mon, Feb 25, 2019 at 10:47 AM Alex Amato  wrote:
>
>> @Michael, no particular reason. I think Ken's suggestion makes more sense.
>>
>> On Mon, Feb 25, 2019 at 10:36 AM Udi Meiri  wrote:
>>
>>> Talking about Python:
>>> I only know of "./gradlew lint", which include style and some py3
>>> compliance checking.
>>> There is no auto-fix like spotlessApply AFAIK.
>>>
>>> As a side-note, I really dislike our python line continuation indent
>>> rule, since pycharm can't be configured to adhere to it and I find myself
>>> manually adjusting whitespace all the time.
>>>
>>>
>>> On Mon, Feb 25, 2019 at 10:22 AM Kenneth Knowles 
>>> wrote:
>>>
 FWIW gradle is a depgraph-based build system. You can gain a few
 seconds by putting all but spotlessApply in one command.

 ./gradlew spotlessApply && ./gradlew checkstyleMain checkstyleTest
 javadoc findbugsMain compileTestJava compileJava

 It might be clever to define a meta-task. Gradle "base plugin" has the
 notable check (build and run tests), assemble (make artifacts), and build
 (assemble + check, badly named!)

 I think something like "everything except running tests and building
 artifacts" might be helpful.

 Kenn

 On Mon, Feb 25, 2019 at 10:13 AM Alex Amato  wrote:

> I made a thread about this a while back for java, but I don't think
> the same commands like sptoless work for python.
>
> auto fixing lint issues
> running and quick checks which would fail the PR (without running the
> whole precommit?)
> Something like findbugs to detect common issues (i.e. py3 compliance)
>
> FWIW, this is what I have been using for java. It will catch pretty
> much everything except presubmit test failures.
>
> ./gradlew spotlessApply && ./gradlew checkstyleMain && ./gradlew
> checkstyleTest && ./gradlew javadoc && ./gradlew findbugsMain && ./gradlew
> compileTestJava && ./gradlew compileJava
>

>
> --
> 
> Ruoyun  Huang
>
>


Stateful processing : @OnWindowExpiration DoFn annotation

2019-02-25 Thread Augustin Lafanechere
Hello dear Beam community,
I would like to write to you for a question about OnWindowExpiration annotation 
on DoFn.
Does anyone of you have a working snippet with this ?

I try to write a DoFn with a Batch RPC on window closure. It is a BigQuery call 
for a historical metric value updated by an external process. I want to execute 
this query and sum the results with my events buffered in a state. The 
OnWindowExpiration looks very practical to accomplish this.

It looks like the function annotated with @OnWindowExpiration is never call. My 
pipeline runs on Dataflow, perhaps its not a supported feature on this runner…

Here is a snippet of what I try to accomplish. It seems like the annotated 
functions is never called, the log line is never appearing. Am I missing 
something ?
I tried to replicate the logic found in this blog post 
 and pieces of 
information found in this PR. 


// The window definition used in the pipeline sets in a higher transform
// Window> w =
// Window.into(FixedWindows.of(Duration.standardMinutes(1L)))
// .withAllowedLateness(Duration.ZERO)
// .discardingFiredPanes();

public final class Enrich extends DoFn, KV> {

  @StateId("buffer")
  private final StateSpec>> bufferedEvents = 
StateSpecs.bag();

  @ProcessElement
  public void process(
  final ProcessContext context,
  final @StateId("buffer") BagState> bufferState) {
bufferState.add(context.element());
context.output(context.element());
  }

  @OnWindowExpiration
  public void onWindowExpiration(
  final @StateId("buffer") BagState> bufferState,
  final OutputReceiver> outputReceiver) {
LOG.info("The window expired");
for (KV enrichedEvent : 
enrichWithBigQuery(bufferState.read())) {
  outputReceiver.output(enrichedEvent);
}
  }
}


Thanks for your help,


Augustin



-- 
Chauffeur Privé devient kapten_ Plus d'informations ici 



Re: [VOTE] Release 2.11.0, release candidate #1

2019-02-25 Thread Charles Chen
+1.  I tested Python 3 support in batch and streaming mode (using wordcount
and streaming wordcount) on both DirectRunner and DataflowRunner.

On Mon, Feb 25, 2019 at 7:54 AM Łukasz Gajowy  wrote:

> Hi,
>
> https://issues.apache.org/jira/browse/BEAM-6697 Is this issue a release
> blocker? I'm asking because performance tests are not part of the release
> verification checklist. (Should they be?)
>
> The issue seems to be related to `google_cloud_bigdataoss_version` change
> (1.9.0 -> 1.9.12)
>
> Łukasz
>
> pon., 25 lut 2019 o 11:08 Maximilian Michels  napisał(a):
>
>> +1 (binding)
>>
>> On 25.02.19 03:44, Konstantinos Katsiapis wrote:
>> > +1
>> >
>> > We (TFX ) are really looking forward to
>> > the Python 3 compatibility that Apache Beam 2.11 brings. The 2.11
>> > release will allow several of our existing Apache Beam based libraries
>> > like TensorFlow Data Validation
>> > , TensorFlow
>> > Transform  and
>> TensorFlow
>> > Model Analysis  to
>> > be Python 3 Compatible (since they are already Python 3 "Ready" and as
>> > such blocked on this release).
>> >
>> > Thanks,
>> > Gus
>> >
>> > On Fri, Feb 22, 2019 at 7:08 PM Reuven Lax > > > wrote:
>> >
>> > +1 (binding)
>> >
>> > On Fri, Feb 22, 2019 at 5:09 PM Robert Bradshaw <
>> rober...@google.com
>> > > wrote:
>> >
>> > +1 (binding)
>> >
>> > I verified the artifacts for correctness, as well as one of the
>> > wheels
>> > on simple pipelines (Python 3).
>> >
>> >
>> > On Sat, Feb 23, 2019 at 1:01 AM Kenneth Knowles <
>> k...@apache.org
>> > > wrote:
>> >  >
>> >  > +1 (binding)
>> >  >
>> >  > Kenn
>> >  >
>> >  > On Fri, Feb 22, 2019 at 3:51 PM Ahmet Altay <
>> al...@google.com
>> > > wrote:
>> >  >>
>> >  >>
>> >  >>
>> >  >> On Fri, Feb 22, 2019 at 3:46 PM Kenneth Knowles
>> > mailto:k...@apache.org>> wrote:
>> >  >>>
>> >  >>> I believe you need to sign & hash the Python wheels. The
>> > instructions is unfortunately a bit hidden in the release guide
>> > without an entry in the table of contents:
>> >  >>
>> >  >>
>> >  >> Done, thank you for the pointer.
>> >  >>
>> >  >>>
>> >  >>>
>> >  >>> "Once all python wheels have been staged dist.apache.org
>> > , please run
>> > ./sign_hash_python_wheels.sh to sign and hash python wheels."
>> >  >>>
>> >  >>> On Fri, Feb 22, 2019 at 8:40 AM Ahmet Altay
>> > mailto:al...@google.com>> wrote:
>> >  
>> >  
>> >  
>> >   On Fri, Feb 22, 2019 at 1:32 AM Robert Bradshaw
>> > mailto:rober...@google.com>> wrote:
>> >  >
>> >  > It looks like
>> > https://github.com/apache/beam/blob/release-2.11.0/build.gradle
>> >  > differs from the copy in the release source tarball (line
>> > 22, and some
>> >  > whitespace below). Other than that, the artifacts and
>> > signatures look
>> >  > good.
>> >  
>> >  
>> >   Thank you. I fixed the issue (please take a look again).
>> > The difference was due to
>> > https://issues.apache.org/jira/browse/BEAM-6726.
>> >  
>> >  >
>> >  >
>> >  > On Fri, Feb 22, 2019 at 9:50 AM Ahmet Altay
>> > mailto:al...@google.com>> wrote:
>> >  > >
>> >  > > Hi everyone,
>> >  > >
>> >  > > Please review and vote on the release candidate #1 for
>> > the version 2.11.0, as follows:
>> >  > >
>> >  > > [ ] +1, Approve the release
>> >  > > [ ] -1, Do not approve the release (please provide
>> > specific comments)
>> >  > >
>> >  > > The complete staging area is available for your review,
>> > which includes:
>> >  > > * JIRA release notes [1],
>> >  > > * the official Apache source release to be deployed to
>> > dist.apache.org  [2], which is signed
>> > with the key with fingerprint
>> > 64B84A5AD91F9C20F5E9D9A7D62E71416096FA00 [3],
>> >  > > * all artifacts to be deployed to the Maven Central
>> > Repository [4],
>> >  > > * source code tag "v2.11.0-RC1" [5],
>> >  > > * website pull request listing the release [6] and
>> > publishing the API reference manual [7].
>> >  > > * Python artifacts are deploy

[Call for items] Feb-March Beam Newsletter

2019-02-25 Thread Rose Nguyen
Hi Beamers:

Time to catch everyone up on community happenings!

Add to [1] the highlights from January to now (or planned events and talks)
that you want to share by *March 3rd*, 11:59 p.m. PDT.

We will collect the notes via Google docs but send out the final version
directly to the user mailing list. If you do not know how to format
something, it is OK to just put down the info and I will edit. I'll ship
out the newsletter on *March 4th*.

[1]
https://docs.google.com/document/d/1fgG11CVsH31Jflws8ON6DcAr5yEopBT30srdaQk2nQA

Cheers,
-- 
Rose Thị Nguyễn


Re: What quick command to catch common issues before pushing a python PR?

2019-02-25 Thread Ruoyun Huang
nvm.  Don't take my previous non-scientific comparison (only ran it once)
too seriously. :-)

I tried to repeat each for multiple times and now the difference
diminishes.  likely there was a transient error in caching.

On Mon, Feb 25, 2019 at 3:38 PM Kenneth Knowles  wrote:

> Ah, that is likely caused by us having ill-defined tasks that cannot be
> cached. Or is it that the configuration time is so significant?
>
> Kenn
>
> On Mon, Feb 25, 2019 at 11:05 AM Ruoyun Huang  wrote:
>
>> Out of curiosity as a light gradle user, I did a side by side comparison,
>> and the readings confirm what Ken and Michael suggests.
>>
>> In the same repository, do gradle clean then followed by either of the
>> two commands. Measure their runtime respectively.  The latter one takes
>> *1/3* running time.
>>
>> time ./gradlew spotlessApply && ./gradlew checkstyleMain && ./gradlew
>> checkstyleTest && ./gradlew javadoc && ./gradlew findbugsMain && ./gradlew
>> compileTestJava && ./gradlew compileJava
>> real 9m29.330s user 0m11.330s sys 0m1.239s
>>
>> time ./gradlew spotlessApply checkstyleMain checkstyleTest javadoc
>> findbugsMain compileJava compileTestJava
>> real3m35.573s
>> user0m2.701s
>> sys 0m0.327s
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Feb 25, 2019 at 10:47 AM Alex Amato  wrote:
>>
>>> @Michael, no particular reason. I think Ken's suggestion makes more
>>> sense.
>>>
>>> On Mon, Feb 25, 2019 at 10:36 AM Udi Meiri  wrote:
>>>
 Talking about Python:
 I only know of "./gradlew lint", which include style and some py3
 compliance checking.
 There is no auto-fix like spotlessApply AFAIK.

 As a side-note, I really dislike our python line continuation indent
 rule, since pycharm can't be configured to adhere to it and I find myself
 manually adjusting whitespace all the time.


 On Mon, Feb 25, 2019 at 10:22 AM Kenneth Knowles 
 wrote:

> FWIW gradle is a depgraph-based build system. You can gain a few
> seconds by putting all but spotlessApply in one command.
>
> ./gradlew spotlessApply && ./gradlew checkstyleMain checkstyleTest
> javadoc findbugsMain compileTestJava compileJava
>
> It might be clever to define a meta-task. Gradle "base plugin" has the
> notable check (build and run tests), assemble (make artifacts), and build
> (assemble + check, badly named!)
>
> I think something like "everything except running tests and building
> artifacts" might be helpful.
>
> Kenn
>
> On Mon, Feb 25, 2019 at 10:13 AM Alex Amato 
> wrote:
>
>> I made a thread about this a while back for java, but I don't think
>> the same commands like sptoless work for python.
>>
>> auto fixing lint issues
>> running and quick checks which would fail the PR (without running the
>> whole precommit?)
>> Something like findbugs to detect common issues (i.e. py3 compliance)
>>
>> FWIW, this is what I have been using for java. It will catch pretty
>> much everything except presubmit test failures.
>>
>> ./gradlew spotlessApply && ./gradlew checkstyleMain && ./gradlew
>> checkstyleTest && ./gradlew javadoc && ./gradlew findbugsMain && 
>> ./gradlew
>> compileTestJava && ./gradlew compileJava
>>
>
>>
>> --
>> 
>> Ruoyun  Huang
>>
>>

-- 

Ruoyun  Huang


Re: Stateful processing : @OnWindowExpiration DoFn annotation

2019-02-25 Thread Reuven Lax
The framework for OnWindowExpiration was added in that PR, but as far as I
can tell it hasn't yet been hooked up - nothing every calls it. Adding Kenn
who might know more; is there a pending PR somewhere that finishes this
work?

On Mon, Feb 25, 2019 at 3:41 PM Augustin Lafanechere <
augustin.lafanech...@kapten.com> wrote:

> Hello dear Beam community,
> I would like to write to you for a question about OnWindowExpiration
> annotation on DoFn.
> Does anyone of you have a working snippet with this ?
>
> I try to write a DoFn with a Batch RPC on window closure. It is a BigQuery
> call for a historical metric value updated by an external process. I want
> to execute this query and sum the results with my events buffered in a
> state. The OnWindowExpiration looks very practical to accomplish this.
>
> It looks like the function annotated with @OnWindowExpiration is never
> call. My pipeline runs on Dataflow, perhaps its not a supported feature on
> this runner…
>
> Here is a snippet of what I try to accomplish. It seems like the annotated
> functions is never called, the log line is never appearing. Am I missing
> something ?
> I tried to replicate the logic found in this blog post
>  and
> pieces of information found in this PR.
> 
>
>
> // The window definition used in the pipeline sets in a higher transform
> // Window> w =
> // Window.into(FixedWindows.of(Duration.standardMinutes(1L)))
> // .withAllowedLateness(Duration.ZERO)
> // .discardingFiredPanes();
>
> public final class Enrich extends DoFn, KV>
> {
>
> @StateId("buffer")
> private final StateSpec>> bufferedEvents =
> StateSpecs.bag();
>
> @ProcessElement
> public void process(
> final ProcessContext context,
> final @StateId("buffer") BagState> bufferState) {
> bufferState.add(context.element());
> context.output(context.element());
> }
>
> @OnWindowExpiration
> public void onWindowExpiration(
> final @StateId("buffer") BagState> bufferState,
> final OutputReceiver> outputReceiver) {
> LOG.info("The window expired");
> for (KV enrichedEvent : enrichWithBigQuery(bufferState.read()))
> {
> outputReceiver.output(enrichedEvent);
> }
> }
> }
>
>
> Thanks for your help,
>
>
> Augustin
>
>
>
> Chauffeur Privé devient kapten_ Plus d'informations ici
> 
>


Re: Stateful processing : @OnWindowExpiration DoFn annotation

2019-02-25 Thread Kenneth Knowles
I think that the contribution stalled after the first PR. It seems like
pretty low-hanging fruit to set up a ValidatesRunner test and make it work
in your favorite runner.

Kenn

On Mon, Feb 25, 2019 at 5:07 PM Reuven Lax  wrote:

> The framework for OnWindowExpiration was added in that PR, but as far as I
> can tell it hasn't yet been hooked up - nothing every calls it. Adding Kenn
> who might know more; is there a pending PR somewhere that finishes this
> work?
>
> On Mon, Feb 25, 2019 at 3:41 PM Augustin Lafanechere <
> augustin.lafanech...@kapten.com> wrote:
>
>> Hello dear Beam community,
>> I would like to write to you for a question about OnWindowExpiration
>> annotation on DoFn.
>> Does anyone of you have a working snippet with this ?
>>
>> I try to write a DoFn with a Batch RPC on window closure. It is a
>> BigQuery call for a historical metric value updated by an external process.
>> I want to execute this query and sum the results with my events buffered in
>> a state. The OnWindowExpiration looks very practical to accomplish this.
>>
>> It looks like the function annotated with @OnWindowExpiration is never
>> call. My pipeline runs on Dataflow, perhaps its not a supported feature on
>> this runner…
>>
>> Here is a snippet of what I try to accomplish. It seems like the
>> annotated functions is never called, the log line is never appearing. Am I
>> missing something ?
>> I tried to replicate the logic found in this blog post
>>  and
>> pieces of information found in this PR.
>> 
>>
>>
>> // The window definition used in the pipeline sets in a higher transform
>> // Window> w =
>> // Window.into(FixedWindows.of(Duration.standardMinutes(1L)))
>> // .withAllowedLateness(Duration.ZERO)
>> // .discardingFiredPanes();
>>
>> public final class Enrich extends DoFn, KV>
>> {
>>
>> @StateId("buffer")
>> private final StateSpec>> bufferedEvents =
>> StateSpecs.bag();
>>
>> @ProcessElement
>> public void process(
>> final ProcessContext context,
>> final @StateId("buffer") BagState> bufferState) {
>> bufferState.add(context.element());
>> context.output(context.element());
>> }
>>
>> @OnWindowExpiration
>> public void onWindowExpiration(
>> final @StateId("buffer") BagState> bufferState,
>> final OutputReceiver> outputReceiver) {
>> LOG.info("The window expired");
>> for (KV enrichedEvent : enrichWithBigQuery(bufferState.read()))
>> {
>> outputReceiver.output(enrichedEvent);
>> }
>> }
>> }
>>
>>
>> Thanks for your help,
>>
>>
>> Augustin
>>
>>
>>
>> Chauffeur Privé devient kapten_ Plus d'informations ici
>> 
>>
>