I'm certainly aware of tools that read TEventSequence and expect it to have
certain structures. If we change the profiles significantly, we should
insert a version sigil in there so that a client could at least figure out
that they're looking at something new.

A bigger modeling conversation is the use of string keys everywhere. For
the timeline, for example, it's a list of (description, time) pairs, with
the descriptions being strings. An alternative would have been to put the
strings in Thrift itself. The latter, I think, makes tooling easier to
write (since you're calling "get_query_started_time()" instead of "for
description, time in timelines: if description == 'query started': return
time". I recognize working with Thrift is kind of a pain, but I want to
throw the thought out there if we're adding things to profiles.

-- Philip


On Tue, Sep 4, 2018 at 4:02 PM Todd Lipcon <[email protected]> wrote:

> Hey folks,
>
> I'm working on a patch to add some more diagnostics from the planning
> process into query profiles.
>
> Currently, only the planning "Timeline" is reported back as part of the
> response to createExecRequest. As part of the fetch-on-demand catalog work
> I'd like to start exposing various counters such as cache hit/miss counts,
> time spent on remote calls to the catalog, etc. Even in the existing code
> paths, I can see some similar metrics being useful.
>
> My current thinking is to remove the 'timeline'  (TEventSequence) field
> from TExecRequest and replace it with a full TRuntimeProfileNode. I'd then
> add some capability in the Java side to fill in counters, etc, in this
> structure.
>
> Any concerns with this approach before I go down this path? Are there any
> compatibility guarantees I need to uphold with the profile output of
> queries?
>
> -Todd
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Reply via email to