Re: Joining bounded and unbounded data not working using non-global window

2018-12-10 Thread Kenneth Knowles
Hi Shrijit,

+dev@beam.apache.org to fact check my memory here and re-raise the issue

You have hit a known usability problem. It has been discussed but not
addressed due to focusing on more holistic fixes, and also backwards
compatibility concerns... if someone was counting on the very unfortunate
current behavior.

You have this triggering set up:

.triggering(
AfterProcessingTime
.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(20)))

On our original design for triggers, this will fire only one time and then
"close" the window. I believe we agreed on the dev list that it should
actually also emit the remaining data at window expiration time.

What you want is probably this:

.triggering(
Repeatedly.forever(
AfterProcessingTime
.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(20

But the recommended approach is to always use the AfterWatermark trigger
with early/late firings, so it would look like this:

.triggering(
AfterWatermark.pastEndOfWindow()
.withEarlyFirings(
AfterProcessingTime
.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(20))

This will emit a designated "on time" output as well as early firings
according to the latency you have set up.

If this does not solve your problem, can you say more about what is going
wrong?

Kenn

On Mon, Dec 10, 2018 at 7:41 PM Reza Ardeshir Rokni 
wrote:

> Hi,
>
> A couple of thoughts;
>
> 1- If the amount of data in Hbase that you need to join with is small and
> does not change, could you use a Side Input? If it does change you could
> try making use of pattern slowly changing lookup cache (ref below).
> 2- If the amount of data is large, would a direct hbase client call from a
> DoFn work to get the data you need to enrich the element? Similar to
> pattern Calling external service, (ref below)
>
> Ref :
> https://cloud.google.com/blog/products/gcp/guide-to-common-cloud-dataflow-use-case-patterns-part-1
>
> Cheers
>
> Reza
>
> On Tue, 11 Dec 2018 at 00:12, Shrijit Pillai 
> wrote:
>
>> Hello,
>>
>> I'm trying to join an unbounded data source and a bounded one using
>> CoGroupByKey. The bounded data source is HBase and the unbounded one is
>> Kafka.
>>
>> The co-group works if the global window strategy is used but not with a
>> non-global one. I've tried the accumulatingFiredPanes mode(using the
>> non-global window) but that didn't help either. Am I missing something to
>> make the co-group work using a non-global window like FixedWindows or is
>> the GlobalWindow the only way to go about it? I'm using beam 2.8.0
>>
>> Here's the code snippet:
>> https://gist.github.com/shrijitpillai/5e9e642f92dd23b3b7bd60e3ce8056bb
>>
>> Thanks
>> Shrijit
>>
>


Re: A new Beam Runner on Apache Nemo

2018-12-10 Thread 송원욱
It's been a while, but just to let you know that there's a PR up regarding
the issue! Anyone who's interested can take a look at
https://github.com/apache/beam/pull/7236.
Wonook


2018년 11월 19일 (월) 오후 2:12, 송원욱 님이 작성:

> Thanks for the reply and the help!
>
> At the moment, we are thinking about keeping the Runner outside Beam for
> the time being, as there are a number of extra ongoing developments going
> regarding a few features for stream processing. I'll submit a PR for the
> website in a short time with the details for the capability matrix, and on
> how to use our Runner, with external links and references, etc.
>
> Regarding the portability layer, at the moment we have been focusing on
> supporting the various features supported by the Java Beam SDK on our
> system and improving the performance of the system itself, but we will
> definitely work on the portability layer with the Nemo Runner, as it sounds
> more than exciting to be able to run Python programs on Apache Nemo. I'll
> definitely check up on the ValidatesRunner tests. We have been running our
> tests as well, so we are quite confident that it would run without much
> problems. Thanks for the tip for the Maven project!
>
> Thanks,
> Wonook
>
>
> 2018년 11월 17일 (토) 오전 1:48, Kenneth Knowles 님이 작성:
>
>> Hi Wonook,
>>
>> Very cool! I see it here:
>> https://github.com/apache/incubator-nemo/tree/master/compiler/frontend/beam/src/main/java/org/apache/nemo/compiler/frontend/beam
>>
>> Some more details on what Max said about running the ValidatesRunner
>> tests:
>>
>>  - if you are planning to contribute the runner to Beam, you can use the
>> other runners as examples and generally the whole community is likely to
>> keep your config up to date
>>
>>  - if you are planning to keep the runner as part of Apache Nemo, then I
>> see you are using Maven to build so you can use an old snapshot as an
>> example, like this:
>> https://github.com/apache/beam/blob/v2.4.0/runners/gearpump/pom.xml#L55
>>
>> Kenn
>>
>> On Fri, Nov 16, 2018 at 3:16 AM Maximilian Michels 
>> wrote:
>>
>>> Hi Wonook,
>>>
>>> First of all, welcome to the Beam community! It is great to see another
>>> Runner emerging.
>>>
>>> If you're planning to contribute your Runner to Beam, you should verify
>>> the compatibility with the ValidatesRunner integration tests. Then open
>>> a PR with documentation, a Runner page, and updates to the matrix.
>>>
>>> If you're planning to leave the Runner outside Beam for the time being,
>>> please submit a Runner page for the Beam website. The page should
>>> contain information on how to use the Runner and a link to the external
>>> web site with up-to-date information.
>>>
>>> Feel free to ask here or in our Slack channel if you have more questions.
>>>
>>> I'm also curious, have you looked into integrating portability with the
>>> Nemo Runner?
>>>
>>> Thanks,
>>> Max
>>>
>>> On 16.11.18 06:51, 송원욱 wrote:
>>> > Hello all!
>>> >
>>> > I'm a member of the Apache Nemo community, another Apache project for
>>> > processing big data focusing on easy-to-use, flexible optimizations
>>> for
>>> > various deployment environments. More information can be seen on our
>>> > website . We've been building the system for
>>> > quite a while now, and we have been using Apache Beam as one of the
>>> > programming layers that we support for writing data processing
>>> > applications. We have already taken a look at the capability matrix
>>> >  of
>>> > Beam runners, and the runner authoring guide
>>> > , and we have been
>>> > successful in implementing a large portion of the capability criteria.
>>> >
>>> > With the progress, we wish to be able to list our runner as one of the
>>> > Beam runners, to  be able to notify the users that our system supports
>>> > Beam, and that Beam users have another option to choose from for
>>> running
>>> > their data processing applications. It would be lovely to know the
>>> > details of the process required for it!
>>> >
>>> > Thanks!
>>> >
>>> >
>>> >   Wonook
>>> >
>>>
>>


Re: OOO

2018-12-10 Thread Thomas Weise
Cute :)

Enjoy the time with the family.

Thomas

On Mon, Dec 10, 2018 at 8:53 AM Ismaël Mejía  wrote:

> Thanks for the community awareness, enjoy the time with the baby and
> see you soon.
>
> On Fri, Dec 7, 2018 at 9:20 PM Lukasz Cwik  wrote:
> >
> > I'll be away for the next three months taking care of my little one[1]
> and am excited to see what happens within Apache Beam when I return.
> >
> > I have been mainly focusing on the portability and SplittableDoFn
> efforts. If there are questions while I'm out, feel free to reach out to
> this dev@ list as there are several community members that have been
> involved.
> >
> > For portability related stuff:
> > Thomas Weise
> > Robert Bradshaw
> > Maximilian Michels
> > Ankur Goenka
> >
> > For SplittableDoFn stuff:
> > Robert Bradshaw
> > Ismael Mejia
> > JB Onofre
> >
> > 1: https://photos.app.goo.gl/sqdcgC5rxDbURPE7A
>


Review requested for [BEAM-6170] Change Nexmark stuckness warnings to not fail pipeline

2018-12-10 Thread Sam Whittle
https://github.com/apache/beam/pull/7191

Thanks!
Sam


Re: OOO

2018-12-10 Thread Ismaël Mejía
Thanks for the community awareness, enjoy the time with the baby and
see you soon.

On Fri, Dec 7, 2018 at 9:20 PM Lukasz Cwik  wrote:
>
> I'll be away for the next three months taking care of my little one[1] and am 
> excited to see what happens within Apache Beam when I return.
>
> I have been mainly focusing on the portability and SplittableDoFn efforts. If 
> there are questions while I'm out, feel free to reach out to this dev@ list 
> as there are several community members that have been involved.
>
> For portability related stuff:
> Thomas Weise
> Robert Bradshaw
> Maximilian Michels
> Ankur Goenka
>
> For SplittableDoFn stuff:
> Robert Bradshaw
> Ismael Mejia
> JB Onofre
>
> 1: https://photos.app.goo.gl/sqdcgC5rxDbURPE7A


Beam Dependency Check Report (2018-12-10)

2018-12-10 Thread Apache Jenkins Server

High Priority Dependency Updates Of Beam Python SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
future
0.16.0
0.17.1
2016-10-27
2018-12-10BEAM-5968
google-cloud-pubsub
0.35.4
0.39.0
2018-06-08
2018-12-10BEAM-5539
oauth2client
3.0.0
4.1.3
None
2018-12-10BEAM-6089
pytz
2018.4
2018.7
2018-05-10
2018-12-10BEAM-5893
High Priority Dependency Updates Of Beam Java SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
com.rabbitmq:amqp-client
4.6.0
5.5.1
2018-03-26
2018-11-29BEAM-5895
org.apache.rat:apache-rat-tasks
0.12
0.13
2016-06-07
2018-10-13BEAM-6039
com.google.auto.service:auto-service
1.0-rc2
1.0-rc4
2014-10-25
2017-12-11BEAM-5541
com.gradle:build-scan-plugin
1.13.1
2.1
2018-04-10
2018-12-10BEAM-5543
org.conscrypt:conscrypt-openjdk
1.1.3
1.4.1
2018-06-04
2018-11-01BEAM-5748
org.elasticsearch:elasticsearch
6.4.0
7.0.0-alpha1
2018-08-18
2018-11-13BEAM-6090
org.elasticsearch:elasticsearch-hadoop
5.0.0
7.0.0-alpha1
2016-10-26
2018-11-13BEAM-5551
org.elasticsearch.client:elasticsearch-rest-client
6.4.0
7.0.0-alpha1
2018-08-18
2018-11-13BEAM-6091
org.elasticsearch.test:framework
6.4.0
7.0.0-alpha1
2018-08-18
2018-11-13BEAM-6092
io.grpc:grpc-auth
1.13.1
1.17.1
2018-06-21
2018-12-07BEAM-5896
io.grpc:grpc-context
1.13.1
1.17.1
2018-06-21
2018-12-07BEAM-5897
io.grpc:grpc-core
1.13.1
1.17.1
2018-06-21
2018-12-07BEAM-5898
io.grpc:grpc-netty
1.13.1
1.17.1
2018-06-21
2018-12-07BEAM-5899
io.grpc:grpc-protobuf
1.13.1
1.17.1
2018-06-21
2018-12-07BEAM-5900
io.grpc:grpc-stub
1.13.1
1.17.1
2018-06-21
2018-12-07BEAM-5901
io.grpc:grpc-testing
1.13.1
1.17.1
2018-06-21
2018-12-07BEAM-5902
com.google.code.gson:gson
2.7
2.8.5
2016-06-14
2018-05-22BEAM-5558
org.apache.hbase:hbase-common
1.2.6
2.1.1
2017-05-29
2018-10-27BEAM-5560
org.apache.hbase:hbase-hadoop-compat
1.2.6
2.1.1
2017-05-29
2018-10-27BEAM-5561
org.apache.hbase:hbase-hadoop2-compat
1.2.6
2.1.1
2017-05-29
2018-10-27BEAM-5562
org.apache.hbase:hbase-server
1.2.6
2.1.1
2017-05-29
2018-10-27BEAM-5563
org.apache.hbase:hbase-shaded-client
1.2.6
2.1.1
2017-05-29
2018-10-27BEAM-5564
org.apache.hive:hive-cli
2.1.0
3.1.1
2016-06-17
2018-10-24BEAM-5566
org.apache.hive:hive-common
2.1.0
3.1.1
2016-06-17
2018-10-24BEAM-5567
org.apache.hive:hive-exec
2.1.0
3.1.1
2016-06-17
2018-10-24BEAM-5568
org.apache.hive.hcatalog:hive-hcatalog-core
2.1.0
3.1.1
2016-06-17
2018-10-24BEAM-5569
net.java.dev.javacc:javacc
4.0
7.0.4
2006-03-17
2018-09-17BEAM-5570
javax.servlet:javax.servlet-api
3.1.0
4.0.1
2013-04-25
2018-04-20BEAM-5750
redis.clients:jedis
2.9.0
3.0.0
2016-07-22
2018-12-06BEAM-6125
org.eclipse.jetty:jetty-server
9.2.10.v20150310
9.4.14.v20181114
2015-03-10
2018-11-14BEAM-5752
org.eclipse.jetty:jetty-servlet
9.2.10.v20150310
9.4.14.v20181114
2015-03-10
2018-11-14BEAM-5753
net.java.dev.jna:jna
4.1.0
5.1.0
2014-03-06
2018-11-14BEAM-5573
junit:junit
4.12
4.13-beta-1
2014-12-04
2018-11-25BEAM-6127
com.esotericsoftware:kryo
4.0.2
5.0.0-RC1
2018-03-20
2018-06-19BEAM-5809
com.esotericsoftware.kryo:kryo
2.21
2.24.0
2013-02-27
2014-05-04BEAM-5574
org.apache.kudu:kudu-client
1.4.0
1.8.0
2017-06-05
2018-10-16BEAM-5575
io.dropwizard.metrics:metrics-core

Re: beam9 failing most of the python tests

2018-12-10 Thread Robert Bradshaw
The same error is impacting our postcommit tests. Who has permissions to
reboot these machines?

On Sat, Dec 8, 2018 at 3:13 AM Ankur Goenka  wrote:

> Virtual env setup is failing because of the following error. Can we reboot
> the machine to see if it fixes the issue?
>
> :beam-sdks-python:setupVirtualenv FAILED
> Traceback (most recent call last):
> New python executable in
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Portable_Python_Commit@2
> /src/build/gradleenv/1327086738/bin/python2
> File "/usr/lib/python3/dist-packages/virtualenv.py", line 2363, in 
> Also creating executable in
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Portable_Python_Commit@2
> /src/build/gradleenv/1327086738/bin/python
> main()
> File "/usr/lib/python3/dist-packages/virtualenv.py", line 719, in main
> symlink=options.symlink)
> File "/usr/lib/python3/dist-packages/virtualenv.py", line 942, in
> create_environment
> site_packages=site_packages, clear=clear, symlink=symlink))
> File "/usr/lib/python3/dist-packages/virtualenv.py", line 1423, in
> install_python
> raise e
> OSError: [Errno 11] Resource temporarily unavailable
> Running virtualenv with interpreter /usr/bin/python2
>
> On Mon, Dec 3, 2018 at 1:12 PM Ankur Goenka  wrote:
>
>> Hi,
>>
>> I see that beam9 is failing significantly more number of python related
>> builds [1].
>> This also results in more failure
>> of beam_PreCommit_Portable_Python_Commit [2] on beam9.
>> Can someone with access to beam9 take a look?
>>
>> Thanks,
>> Ankur
>>
>>
>> [1] https://builds.apache.org/computer/beam9/builds
>> [2]
>> https://builds.apache.org/job/beam_PreCommit_Portable_Python_Commit/buildTimeTrend
>>
>