Re: Building and visualizing the Beam SQL graph

Mingmin Xu Mon, 11 Jun 2018 14:27:56 -0700

EXPLAIN shows the execution plan in SQL perspective only. After converting
to a Beam composite PTransform, there're more steps underneath, each Runner
re-org Beam PTransforms again which makes the final pipeline hard to read.
In SQL module itself, I don't see any difference between `toPTransform` and
`toPCollection`. We could have an easy-to-understand step name when
converting RelNodes, but Runners show the graph to developers.


Mingmin

On Mon, Jun 11, 2018 at 2:06 PM, Andrew Pilloud <apill...@google.com> wrote:

> That sounds correct. And because each rel node might have a different
> input there isn't a standard interface (like PTransform<PCollection<Row>,
> PCollection<Row>> toPTransform());
>
> Andrew
>
> On Mon, Jun 11, 2018 at 1:31 PM Kenneth Knowles <k...@google.com> wrote:
>
>> Agree with that. It will be kind of tricky to generalize. I think there
>> are some criteria in this case that might apply in other cases:
>>
>> 1. Each rel node (or construct of a DSL) should have a PTransform for how
>> it computes its result from its inputs.
>> 2. The inputs to that PTransform should actually be the inputs to the rel
>> node!
>>
>> So I tried to improve #1 but I probably made #2 worse.
>>
>> Kenn
>>
>> On Mon, Jun 11, 2018 at 12:53 PM Anton Kedin <ke...@google.com> wrote:
>>
>>> Not answering the original question, but doesn't "explain" satisfy the
>>> SQL use case?
>>>
>>> Going forward we probably want to solve this in a more general way. We
>>> have at least 3 ways to represent the pipeline:
>>>  - how runner executes it;
>>>  - what it looks like when constructed;
>>>  - what the user was describing in DSL;
>>> And there will probably be more, if extra layers are built on top of
>>> DSLs.
>>>
>>> If possible, we probably should be able to map any level of abstraction
>>> to any other to better understand and debug the pipelines.
>>>
>>>
>>> On Mon, Jun 11, 2018 at 12:17 PM Kenneth Knowles <k...@google.com> wrote:
>>>
>>>> In other words, revert https://github.com/apache/beam/pull/4705/files,
>>>> at least in spirit? I agree :-)
>>>>
>>>> Kenn
>>>>
>>>> On Mon, Jun 11, 2018 at 11:39 AM Andrew Pilloud <apill...@google.com>
>>>> wrote:
>>>>
>>>>> We are currently converting the Calcite Rel tree to Beam by
>>>>> recursively building a tree of nested PTransforms. This results in a weird
>>>>> nested graph in the dataflow UI where each node contains its inputs nested
>>>>> inside of it. I'm going to change the internal data structure for
>>>>> converting the tree from a PTransform to a PCollection, which will result
>>>>> in a more accurate representation of the tree structure being built and
>>>>> should simplify the code as well. This will not change the public 
>>>>> interface
>>>>> to SQL, which will remain a PTransform. Any thoughts or objections?
>>>>>
>>>>> I was also wondering if there are tools for visualizing the Beam graph
>>>>> aside from the dataflow runner UI. What other tools exist?
>>>>>
>>>>> Andrew
>>>>>
>>>>


-- 
----
Mingmin

Re: Building and visualizing the Beam SQL graph

Reply via email to