Re: RelNode#getDescription and memory consumption

Laurent Goujon Thu, 16 Aug 2018 08:30:01 -0700

Will do.

On Wed, Aug 15, 2018 at 4:55 PM Julian Hyde <jh...@apache.org> wrote:


> I see now.
>
> I think the problem only occurs when you call
> AbstractRelNode.recomputeDigest().
>
> The first time the digest is computed, the input RelNodes have a digest
> (and desc) as it has been set in AbstractRelNode’s constructor:
>
>   this.digest = getRelTypeName() + "#" + id;
>   this.desc = digest;
>
> Explain writer uses the “desc” field to identify inputs, but maybe it
> should use id or type + id. Or maybe the “desc” field should be final.
>
> By the way, the comment
>
>   // Substring uses the same underlying array of chars, so saves a bit
>   // of memory.
>
> was true until JDK 1.6 but is no longer true.
>
> Can you log a JIRA case please.
>
> Julian
>
>
>
> > On Aug 15, 2018, at 2:37 PM, Laurent Goujon <laur...@dremio.com> wrote:
> >
> > Sorry, I should have mentioned the method too: HepPlanner#buildFinalPlan
> > (when running RelOptRulesTest#testWindowInParenthesis())
> >
> > On Wed, Aug 15, 2018 at 2:36 PM Laurent Goujon <laur...@dremio.com>
> wrote:
> >
> >> It looks to happen when building the final plan: the hep planner goes
> >> recursively to each node to recompute the digest. In that relnode tree,
> >> there's no more HepRelVertex nodes, and the digest now includes the
> whole
> >> input(s) description.
> >>
> >> On Wed, Aug 15, 2018 at 2:33 PM Julian Hyde <jh...@apache.org> wrote:
> >>
> >>> When I run that test I get
> >>>
> >>> LogicalProject(input=HepRelVertex#10,$0=$9)
> >>>
> >>> Have you screwed something up?
> >>>
> >>>> On Aug 15, 2018, at 2:23 PM, Laurent Goujon <laur...@dremio.com>
> wrote:
> >>>>
> >>>> Just ran RelOptRulesTest with a breakpoint in
> >>>> AbstractRelNode#computeDigest() and I'm able to observe those kind of
> >>>> digest:
> >>>>
> >>>
> "LogicalProject(input=rel#6:LogicalWindow(input=rel#0:LogicalTableScan(table=[CATALOG,
> >>>> SALES, EMP]),window#0=window(partition {0} order by [0] range between
> >>>> UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT()])),$0=$9)"
> >>>>
> >>>> On Wed, Aug 15, 2018 at 2:09 PM Laurent Goujon <laur...@dremio.com>
> >>> wrote:
> >>>>
> >>>>> Here's one (partial) example (truncated because it contains potential
> >>>>> sensitive info, and didn't obfuscate or try to reproduce locally with
> >>> non
> >>>>> sensitive data):
> >>>>>
> >>>>>
> >>>
> "rel#8643738:LogicalProject.NONE.ANY([]).[](input=rel#8643736:LogicalUnion.NONE.ANY([]).[](input#0=rel#8643702:LogicalUnion.NONE.ANY([]).[](input#0=rel#8643668:LogicalUnion.NONE.ANY([]).[](input#0=rel#8643634:LogicalProject.NONE.ANY([]).[](input=rel#8643632:LogicalAggregate.NONE.ANY([]).[](input=rel#8643630:LogicalAggregate.NONE.ANY([]).[](input=rel#8643628:LogicalProject.NONE.ANY([]).[](input=rel#8643626:LogicalFilter.NONE.ANY([]).[](input=rel#8643624:LogicalProject.NONE.ANY([]).[](input=rel#8643622:LogicalProject.NONE.ANY([]).[](input=rel#8643842:MultiJoin.NONE.ANY([]).[](input#0=rel#8643838:LogicalProject.NONE.ANY([]).[](input=rel#8643615:MultiJoin.NONE.ANY([]).[](input#0=rel#8643603:LogicalProject.NONE.ANY([]).[](input=rel#8643601:SampleCrel.NONE.ANY([]).[](input=rel#8639853:ScanCrel.NONE.ANY([]).[](table="...
> >>>>>
> >>>>> The Logical* relnodes don't override computeDigest method, so this is
> >>>>> basically whatever AbstractRelNode#computeDigest is doing:
> >>>>>
> >>>
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/AbstractRelNode.java#L415
> >>>>>
> >>>>> Laurent
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Wed, Aug 15, 2018 at 1:57 PM Julian Hyde <jh...@apache.org>
> wrote:
> >>>>>
> >>>>>> I thought the digest only included the IDs of the inputs, not the
> >>> digest
> >>>>>> of the inputs. Am I mistaken?
> >>>>>>
> >>>>>> Could you give an example of large description & digest?
> >>>>>>
> >>>>>>> On Aug 15, 2018, at 1:46 PM, Laurent Goujon <laur...@dremio.com>
> >>> wrote:
> >>>>>>>
> >>>>>>> Hi folks,
> >>>>>>>
> >>>>>>> I'm looking for some guidance here before opening JIRAs/pull
> >>> requests.
> >>>>>>>
> >>>>>>> I'm examining a memory dump during a planning operation and a
> >>>>>> significant
> >>>>>>> amount of memory are strings used for RelNode digest and
> description
> >>>>>> (some
> >>>>>>> strings being around 130kb). In that particular case, the relnode
> >>> tree
> >>>>>> is
> >>>>>>> particularly deep, and since the digest is basically done
> >>> recursively,
> >>>>>> the
> >>>>>>> deepest/widest the tree, the longer the digest.
> >>>>>>>
> >>>>>>> The easy solution would be to not go deep when adding inputs to the
> >>>>>> digest,
> >>>>>>> and instead of adding the input description to only add their type,
> >>> id
> >>>>>> and
> >>>>>>> traits (and also not recurse). Would this break parts of calcite,
> or
> >>>>>> cause
> >>>>>>> other inconvenience because some use-cases rely on
> digest/description
> >>>>>> to be
> >>>>>>> basically the whole tree in a textual form?
> >>>>>>>
> >>>>>>> Laurent
> >>>>>>
> >>>>>>
> >>>
> >>>
>
>

Re: RelNode#getDescription and memory consumption

Reply via email to