Re: Building and visualizing the Beam SQL graph

Andrew Pilloud Mon, 11 Jun 2018 14:07:07 -0700

That sounds correct. And because each rel node might have a different input
there isn't a standard interface (like PTransform<PCollection<Row>,
PCollection<Row>> toPTransform());


Andrew

On Mon, Jun 11, 2018 at 1:31 PM Kenneth Knowles <[email protected]> wrote:

> Agree with that. It will be kind of tricky to generalize. I think there
> are some criteria in this case that might apply in other cases:
>
> 1. Each rel node (or construct of a DSL) should have a PTransform for how
> it computes its result from its inputs.
> 2. The inputs to that PTransform should actually be the inputs to the rel
> node!
>
> So I tried to improve #1 but I probably made #2 worse.
>
> Kenn
>
> On Mon, Jun 11, 2018 at 12:53 PM Anton Kedin <[email protected]> wrote:
>
>> Not answering the original question, but doesn't "explain" satisfy the
>> SQL use case?
>>
>> Going forward we probably want to solve this in a more general way. We
>> have at least 3 ways to represent the pipeline:
>>  - how runner executes it;
>>  - what it looks like when constructed;
>>  - what the user was describing in DSL;
>> And there will probably be more, if extra layers are built on top of DSLs.
>>
>> If possible, we probably should be able to map any level of abstraction
>> to any other to better understand and debug the pipelines.
>>
>>
>> On Mon, Jun 11, 2018 at 12:17 PM Kenneth Knowles <[email protected]> wrote:
>>
>>> In other words, revert https://github.com/apache/beam/pull/4705/files,
>>> at least in spirit? I agree :-)
>>>
>>> Kenn
>>>
>>> On Mon, Jun 11, 2018 at 11:39 AM Andrew Pilloud <[email protected]>
>>> wrote:
>>>
>>>> We are currently converting the Calcite Rel tree to Beam by recursively
>>>> building a tree of nested PTransforms. This results in a weird nested graph
>>>> in the dataflow UI where each node contains its inputs nested inside of it.
>>>> I'm going to change the internal data structure for converting the tree
>>>> from a PTransform to a PCollection, which will result in a more accurate
>>>> representation of the tree structure being built and should simplify the
>>>> code as well. This will not change the public interface to SQL, which will
>>>> remain a PTransform. Any thoughts or objections?
>>>>
>>>> I was also wondering if there are tools for visualizing the Beam graph
>>>> aside from the dataflow runner UI. What other tools exist?
>>>>
>>>> Andrew
>>>>
>>>

Re: Building and visualizing the Beam SQL graph

Reply via email to