Re: [VOTE] Policies for managing Beam dependencies

2018-06-11 Thread Bashir Sadjad
FWIW, I also think that this has relevance for users. I am a user of Beam not a contributor and only monitor this list at a high level. But I think the dependency issue is something that many users have to deal with. It has bitten us at least twice over the last few months due to the fact that we

Re: [VOTE] Apache Beam, version 2.5.0, release candidate #1

2018-06-11 Thread Ahmet Altay
Thank you JB. For the wheel artifacts, Boyuan was trying to get the instructions from Robert and reproduce the artifacts. She can help you with this if you need. Ahmet On Mon, Jun 11, 2018 at 10:29 PM, Jean-Baptiste Onofré wrote: > Hi, > > sorry, I missed wheel artifact. Something to add on

Re: [VOTE] Apache Beam, version 2.5.0, release candidate #1

2018-06-11 Thread Jean-Baptiste Onofré
Hi, sorry, I missed wheel artifact. Something to add on the release guide ;) I will add it this morning, I think I know how to generate it ;) Regards JB On 12/06/2018 02:45, Pablo Estrada wrote: > Thanks everyone who has pitched in to validate the release! > > Boyuan Zhang and I have also run

Re: [VOTE] Apache Beam, version 2.5.0, release candidate #1

2018-06-11 Thread Jean-Baptiste Onofré
Hi, no problem, I can cut RC2 as soon as the cherry pick is done. Thanks for catching up ! Please let me know when the cherry pick is done, or you can do the PR and I will do it, up to you. Regards JB On 12/06/2018 04:02, Udi Meiri wrote: > Another bug: reading from PubSub

Re: [VOTE] Policies for managing Beam dependencies

2018-06-11 Thread Ahmet Altay
I think this is relevant for users. It makes sense for users to know about how Beam work with its dependencies and understand how conflicts will be addressed and when dependencies will be upgraded. On Mon, Jun 11, 2018 at 9:09 PM, Kenneth Knowles wrote: > Do you think this has relevance for

Re: [VOTE] Policies for managing Beam dependencies

2018-06-11 Thread Kenneth Knowles
Do you think this has relevance for users? If not, it might be a good use of the new Confluence space. I'm not too familiar with the way permission work, but perhaps we can have a more locked down area that is for policy decisions like this. Kenn On Mon, Jun 11, 2018 at 3:58 PM Chamikara

Re: [VOTE] Apache Beam, version 2.5.0, release candidate #1

2018-06-11 Thread Udi Meiri
Another bug: reading from PubSub with_attributes=True is broken on Python with Dataflow. https://issues.apache.org/jira/browse/BEAM-4536 JB, I'm making a PR that removes this keyword and I'd like to propose it as a cherrypick to 2.5.0. (feature should be fixed in the next release) On Mon, Jun

Re: [VOTE] Apache Beam, version 2.5.0, release candidate #1

2018-06-11 Thread Chamikara Jayalath
FYI: looks like Python tests are failing for Windows. JIRA is https://issues.apache.org/jira/browse/BEAM-4535. I don't think this is a release blocker but this should probably go in release notes (for any user that tries to run tests on Python source build). And we should try to incorporate a fix

Re: [VOTE] Apache Beam, version 2.5.0, release candidate #1

2018-06-11 Thread Pablo Estrada
Thanks everyone who has pitched in to validate the release! Boyuan Zhang and I have also run a few pipelines, and verified that they work properly (see release validation spreadsheet[1]). We have also found that the Game Stats pipeline is failing in Python Streaming Dataflow. I have filed

Re: [VOTE] Policies for managing Beam dependencies

2018-06-11 Thread Chamikara Jayalath
Hi All, Based on the vote (3 PMC +1s and no -1s) and based on the discussions in the doc (seems to be mostly positive), I think we can go ahead and implement some of the policies discussed so far. I have given some of the potential action items below. * Automatically generate human readable

Re: [DISCUSS] Use Confluence wiki for non-user-facing stuff

2018-06-11 Thread Mikhail Gryzykhin
+1 for having a wiki. One comment: Is there a specific reason for it to be a Confluence engine? I know that we use JIRA by Atlassian, but we also utilize Github that has its own wiki engine that is closer to code and much more lightweight. I would prefer wiki hosted on Github. --Mikhail On

Re: [DISCUSS] Use Confluence wiki for non-user-facing stuff

2018-06-11 Thread Kenneth Knowles
OK, yea, that all makes sense to me. Like this? - site/documentation: writing just for users - site/contribute: basic stuff as-is, writing for users to entice them, links to the next... - wiki/contributors: contributors writing just for each other And you also have - wiki/users: users

Re: [DISCUSS] Use Confluence wiki for non-user-facing stuff

2018-06-11 Thread Robert Bradshaw
On Fri, Jun 8, 2018 at 2:18 PM Kenneth Knowles wrote: > I disagree strongly here - I don't think the wiki will have appropriate > polish for users. Even if carefully polished I don't think the presentation > style is right, and it is not flexible. Power users will find it, of course. > I

Re: Building and visualizing the Beam SQL graph

2018-06-11 Thread Mingmin Xu
EXPLAIN shows the execution plan in SQL perspective only. After converting to a Beam composite PTransform, there're more steps underneath, each Runner re-org Beam PTransforms again which makes the final pipeline hard to read. In SQL module itself, I don't see any difference between `toPTransform`

Re: Building and visualizing the Beam SQL graph

2018-06-11 Thread Andrew Pilloud
That sounds correct. And because each rel node might have a different input there isn't a standard interface (like PTransform, PCollection> toPTransform()); Andrew On Mon, Jun 11, 2018 at 1:31 PM Kenneth Knowles wrote: > Agree with that. It will be kind of tricky to generalize. I think there >

Re: [DISCUSS] [BEAM-4126] Deleting Maven build files (pom.xml) grace period?

2018-06-11 Thread Lukasz Cwik
Thanks all, it seems as though only Google needs the grace period. I'll wait for the shorter of BEAM-4512 or two weeks before merging https://github.com/apache/beam/pull/5571 On Wed, Jun 6, 2018 at 8:29 PM Kenneth Knowles wrote: > +1 > > Definitely a good opportunity to decouple your build

Re: Building and visualizing the Beam SQL graph

2018-06-11 Thread Andrew Pilloud
Not quite a revert, we still want to keep the actual transformation inside a PTransform but the input of that PTransform will be different for each node type (joins have multiple inputs for example). We have this function as our builder right now: PTransform> toPTransform(); When I'm done we

Re: Building and visualizing the Beam SQL graph

2018-06-11 Thread Kenneth Knowles
Agree with that. It will be kind of tricky to generalize. I think there are some criteria in this case that might apply in other cases: 1. Each rel node (or construct of a DSL) should have a PTransform for how it computes its result from its inputs. 2. The inputs to that PTransform should

Re: Multimap PCollectionViews' values udpated rather than appended

2018-06-11 Thread Lukasz Cwik
Thanks for the snippet, updated BEAM-4470 with the additional details. On Mon, Jun 11, 2018 at 10:56 AM Carlos Alonso wrote: > Many thanks for your help. Actually, my use case emits the entire map > everytime, so I guess I'm good to go with discarding mode. > > This test reproduces the issue: >

Re: Building and visualizing the Beam SQL graph

2018-06-11 Thread Anton Kedin
Not answering the original question, but doesn't "explain" satisfy the SQL use case? Going forward we probably want to solve this in a more general way. We have at least 3 ways to represent the pipeline: - how runner executes it; - what it looks like when constructed; - what the user was

Re: [VOTE] Apache Beam, version 2.5.0, release candidate #1

2018-06-11 Thread Alan Myrvold
+1 (non-binding) tested some of the quickstarts On Sun, Jun 10, 2018 at 1:39 AM Tim wrote: > Tested by our team: > - mvn inclusion > - Avro, ES, Hadoop IF IO > - Pipelines run on Spark (Cloudera 5.12.0 YARN cluster) > - Reviewed release notes > > +1 > > Thanks also to everyone who helped get

Re: Building and visualizing the Beam SQL graph

2018-06-11 Thread Kenneth Knowles
In other words, revert https://github.com/apache/beam/pull/4705/files, at least in spirit? I agree :-) Kenn On Mon, Jun 11, 2018 at 11:39 AM Andrew Pilloud wrote: > We are currently converting the Calcite Rel tree to Beam by recursively > building a tree of nested PTransforms. This results in

Re: Building and visualizing the Beam SQL graph

2018-06-11 Thread Huygaa Batsaikhan
I was also wondering the same thing. I don't think there is any visualization tool for Beam. :( On Mon, Jun 11, 2018 at 11:39 AM Andrew Pilloud wrote: > We are currently converting the Calcite Rel tree to Beam by recursively > building a tree of nested PTransforms. This results in a weird

Re: Beam SQL: Integrating runners & IO

2018-06-11 Thread Andrew Pilloud
Thanks for the great writeup Kenn! I really like the part about pushing the TableProvider abstraction into core and the IOs. This would make it really easy to extend the IOs supported by your SQL shell just by adding the appropriate IOs to the classpath. It would also make testing much easier as

Building and visualizing the Beam SQL graph

2018-06-11 Thread Andrew Pilloud
We are currently converting the Calcite Rel tree to Beam by recursively building a tree of nested PTransforms. This results in a weird nested graph in the dataflow UI where each node contains its inputs nested inside of it. I'm going to change the internal data structure for converting the tree

Re: Multimap PCollectionViews' values udpated rather than appended

2018-06-11 Thread Carlos Alonso
Many thanks for your help. Actually, my use case emits the entire map everytime, so I guess I'm good to go with discarding mode. This test reproduces the issue:

Re: Beam SQL Improvements

2018-06-11 Thread Reuven Lax
Does DirectRunner do this today? On Mon, Jun 4, 2018 at 9:10 PM Lukasz Cwik wrote: > Shouldn't the runner isolate each instance of the pipeline behind an > appropriate class loader? > > On Sun, Jun 3, 2018 at 12:45 PM Reuven Lax wrote: > >> Just an update: Romain and I chatted on Slack, and I

Beam SQL: Integrating runners & IO

2018-06-11 Thread Kenneth Knowles
Hi all, Andrew mentioned something super cool about what he and Anton have done with Beam SQL: it now implements a JDBC driver (via Calcite Avatica). And since the sqlline client is tiny, it is just baked in. So you can java -jar the JDBC driver and run a little shell. But the shell only has the

Re: [FYI] New Apache Beam Swag Store!

2018-06-11 Thread Mikhail Gryzykhin
That's nice! More colors are appreciated :) --Mikhail On Sun, Jun 10, 2018 at 8:20 PM Kenneth Knowles wrote: > Sweet! Agree with Raghu :-) > > Kenn > > On Sun, Jun 10, 2018 at 6:06 AM Matthias Baetens < > baetensmatth...@gmail.com> wrote: > >> Great news, big thanks for all the work, Gris!

Re: [DISCUSS] Use Confluence wiki for non-user-facing stuff

2018-06-11 Thread Kenneth Knowles
Yup, I'm "kenn". Thanks! To the thread - if you want write access, please ask. Committers should certainly have it (but I'll do it lazily) and beyond that I think is up for discussion. Kenn On Mon, Jun 11, 2018 at 5:52 AM Daniel Kulp wrote: > Neither you nor JB was setup to be able to do

Re: [DISCUSS] Use Confluence wiki for non-user-facing stuff

2018-06-11 Thread Daniel Kulp
Neither you nor JB was setup to be able to do anything with the Wiki. I just enabled JB and yourself (assume “kenn” is you) to have permission to administer the space so you should be able to add other people. Dan > On Jun 9, 2018, at 9:57 AM, Kenneth Knowles wrote: > > Yea, we have

Re: Announcement & Proposal: HDFS tests on large cluster.

2018-06-11 Thread Kamil Szewczyk
Hi all, as a positive outcome of extending kubernetes cluster at the bottom of the https://builds.apache.org/view/A-D/view/Beam/job/beam_PerformanceTests_Analysis/37/consoleText and on dedicated slack channel https://apachebeam.slack.com/messages/CAB3W69SS/ we can observe better stability of the