EXPLAIN shows the execution plan in SQL perspective only. After converting to a Beam composite PTransform, there're more steps underneath, each Runner re-org Beam PTransforms again which makes the final pipeline hard to read. In SQL module itself, I don't see any difference between `toPTransform` and `toPCollection`. We could have an easy-to-understand step name when converting RelNodes, but Runners show the graph to developers.
Mingmin On Mon, Jun 11, 2018 at 2:06 PM, Andrew Pilloud <apill...@google.com> wrote: > That sounds correct. And because each rel node might have a different > input there isn't a standard interface (like PTransform<PCollection<Row>, > PCollection<Row>> toPTransform()); > > Andrew > > On Mon, Jun 11, 2018 at 1:31 PM Kenneth Knowles <k...@google.com> wrote: > >> Agree with that. It will be kind of tricky to generalize. I think there >> are some criteria in this case that might apply in other cases: >> >> 1. Each rel node (or construct of a DSL) should have a PTransform for how >> it computes its result from its inputs. >> 2. The inputs to that PTransform should actually be the inputs to the rel >> node! >> >> So I tried to improve #1 but I probably made #2 worse. >> >> Kenn >> >> On Mon, Jun 11, 2018 at 12:53 PM Anton Kedin <ke...@google.com> wrote: >> >>> Not answering the original question, but doesn't "explain" satisfy the >>> SQL use case? >>> >>> Going forward we probably want to solve this in a more general way. We >>> have at least 3 ways to represent the pipeline: >>> - how runner executes it; >>> - what it looks like when constructed; >>> - what the user was describing in DSL; >>> And there will probably be more, if extra layers are built on top of >>> DSLs. >>> >>> If possible, we probably should be able to map any level of abstraction >>> to any other to better understand and debug the pipelines. >>> >>> >>> On Mon, Jun 11, 2018 at 12:17 PM Kenneth Knowles <k...@google.com> wrote: >>> >>>> In other words, revert https://github.com/apache/beam/pull/4705/files, >>>> at least in spirit? I agree :-) >>>> >>>> Kenn >>>> >>>> On Mon, Jun 11, 2018 at 11:39 AM Andrew Pilloud <apill...@google.com> >>>> wrote: >>>> >>>>> We are currently converting the Calcite Rel tree to Beam by >>>>> recursively building a tree of nested PTransforms. This results in a weird >>>>> nested graph in the dataflow UI where each node contains its inputs nested >>>>> inside of it. I'm going to change the internal data structure for >>>>> converting the tree from a PTransform to a PCollection, which will result >>>>> in a more accurate representation of the tree structure being built and >>>>> should simplify the code as well. This will not change the public >>>>> interface >>>>> to SQL, which will remain a PTransform. Any thoughts or objections? >>>>> >>>>> I was also wondering if there are tools for visualizing the Beam graph >>>>> aside from the dataflow runner UI. What other tools exist? >>>>> >>>>> Andrew >>>>> >>>> -- ---- Mingmin