Bryan Pendleton wrote:

I was thinking that there might be some smaller, easier, things that we
could do which would offer some smaller wins, but might still be worth it.

Thanks for taking an interest in this discussion, Bryan, and for writing-up some ideas!

I think that if these smaller wins also aid in the accomplishment of the bigger goal (i.e. emitting info as XML), then this is definitely a good approach.

Ideas that I had include:

 1) logQueryPlan output intermixes timing information ("we processed this
    many rows in this many milliseconds") with structural information
    ("we did a distinct scalar aggregate of a sort of a union"). I was
    thinking that we could refine the logQueryPlan behavior so that you
    could say things like:

     logQueryPlan=[nodeTree | estTime | actTime | estRows | actRows]

    so that the user could choose a subset of the information if they were
    only interested in the "shape" of the tree, say, and not the detailed
    performance numbers; the idea is just that we give the user a way to
pick a smaller subset of the information to give them a more approachable amount of output.

If this same kind of "subsetting" will also be used for the XML formatting, then something like this could be useful. In the particular breakdown shown in the example, though, it seems like "nodeTree" could get to be pretty large. And if we're looking at time/row values for each node in the tree, would the values exist on top of the "nodeTree" structure? I.e. "estTime" is "nodeTree" with additional info?

I think the big thing to figure out here is what "nodeTree" would be. It should theoretically be small enough to only display what's "relevant". But what's relevant for one query (or user) may not be relevant for another. So we'd have to either 1) dump anything which could potentially be "relevant", or 2) allow the person looking at the query plan to determine what s/he wants to look at. In the former case we end up back where we started with too large of a query plan (though use of the "brief | full" tag below could help); in the latter, we'd have to come up with a way to allow the user to retrieve bits and pieces of what is effectively an unstructured clob. The latter seems like it could require a lot of effort that might be better spent working on an XML format, which then inherently has a means of allowing a user to retrieve the pieces s/he thinks are important (namley, via XPath).

Of course, as I write all of that I'm thinking of the several-thousand line query plans that we get from queries like those in DERBY-1205 and DERBY-1777--which is probably not what the average person has in mind ;) For more manageable queries I agree with you that just being able to show a simple nodeTree with minimal info would be a big improvement over what we have now.

 2) The indentation for the qualification information seems to get lost,
    in my experience, making the display hard to read. Also, since "real"
    query plans are often quite deep, I wonder whether we should display
    things with a fixed 1-or-2 space indentation "step" rather than using
    tabs. The advantage of emitting hard tabs is that the user can reset
    their tab stops, but if they don't, the lines almost always wrap or
    get truncated. Avoiding line wrap could also take the form of slightly
    less wordy displays, so that instead of

      optimizer estimated row count:   1439201.17

    we could say

      est rows: 1439201

    so that as the indentation started to grow, we'd still not wrap lines.
    Of course, this trades off line-wrapping for less-self-evident output.

This is an idea that I think would be useful if we were planning to stick with the current (non-XML) format in the long term. But if the ultimate goal is to switch to XML with some related analysis/viewing tool, I'm not sure what the effect of the such a change would be? I guess if we go with the simplest idea of using Derby's own SQL/XML operators to retrieve pieces of the query plan, then smaller indentation would indeed be useful (because Derby preserves whitespace). And without knowing anything about where or how the query plans are actually written to logs, I would guess (hope) that this particular indentation change would be straightforward...

 3) There's a bunch of kind of detailed internal information in the query
    plan output, and in particular for many common queries we output a lot
    of information about settings which the user probably didn't set, but
    which are just set to their default value. I'm thinking about things
like "Ordered nulls: false" and "Unknown return value: false" and the like. I wonder whether we could change the various output routines so that they only emitted information like this if it was set to a non-default value, which seems like in the common case it might make the query plan display substantially shorter. If we didn't want to do this unconditionally, I thought that maybe we might give the user a knob like:

      logQueryPlanFormat=[brief | full]

Seems like a good suggestion, and one that could be useful for an XML format, as well. The only potential drawback is the addition of yet another Derby property, but I don't know if that's really a concern or not?

Do people think that things like this would be worth doing? Or would it be
better just to bite the bullet and pursue the full-on "XML format and a
separate analysis tool" proposals?

Insofar as these kinds of things can still be useful for the eventual (and currently just theoretical) XML format, I say Yes, they would be useful.

I do wonder if we have to worry about "backward compatibility" of the query plans? Are there people who may have written tools/utilities that expect the log query plan to keep its current shape? If so, then we'd probably have to add another property to toggle the new formatting on/off (with "off" being the default). And that would mean that we'd have to keep logic for generating two different query plans around in the code. I'm not sure how difficult that would be, nor whether use of a "full-on XML format" would make such logic better or worse...?

But all of that blabbering aside, if no one takes up the task of creating a full XML format, then anyone willing to implement changes like the ones mentioned above would still be doing query-plan-readers a favor.

So who's willing to fry what fish? :)

Army

Reply via email to