Re: JIT performance bug/regression & JIT EXPLAIN

2020-01-28 Thread Robert Haas
On Mon, Jan 27, 2020 at 4:18 PM Tom Lane  wrote:
> Robert Haas  writes:
> >>> I do not think that the readability-vs-usefulness tradeoff is going
> >>> to be all that good there, anyway.  Certainly for testing purposes
> >>> it's going to be more useful to examine portions of a structured output.
>
> > I intensely dislike having information that we can't show in the text
> > format, or really, that we can't show in every format.
>
> Well, if it's relegated to a "jit = detail" option or some such,
> the readability objection could be overcome.  But I'm still not clear
> on how you'd physically wedge it into the output, at least not in a way
> that matches up with the proposal that non-text modes handle this stuff
> by producing sub-nodes for the existing types of expression fields.

Well, remember that the text format was the original format. The whole
idea of "groups" was an anachronism that I imposed on the text format
to make it possible to add other formats. It wasn't entirely natural,
because the text format basically indicated nesting by indentation,
and that wasn't going to work for XML or JSON. The text format also
felt free to repeat elements and assume the reader would figure it
out; repeating elements is OK in XML in general, but in JSON it's only
OK if the surrounding context is an array rather than an object.
Anyway, the point is that I (necessarily) started with whatever we had
and found a way to fit it into a structure. It seems like it ought to
be possible to go the other direction also, and figure out how to make
the structured data look OK as text.

Here's Andres's original example:

"Filter": {
  "Expr": "(lineitem.l_shipdate <= '1998-09-18 00:00:00'::timestamp
without time zone)",
  "JIT-Expr": "evalexpr_0_2",
  "JIT-Deform-Scan": "deform_0_3",
  "JIT-Deform-Outer": null,
  "JIT-Deform-Inner": null
}

Right now we show:

Filter: (lineitem.l_shipdate <= '1998-09-18 00:00:00'::timestamp
without time zone)

Andres proposed:

Filter: (lineitem.l_shipdate <= '1998-09-18 00:00:00'::timestamp
without time zone); JIT-Expr: evalexpr_0_2, JIT-Deform-Scan:
deform_0_3

That's not ideal because it's all on one line, but that could be changed:

Filter: (lineitem.l_shipdate <= '1998-09-18 00:00:00'::timestamp
without time zone)
  JIT-Expr: evalexpr_0_2
  JIT-Deform-Scan: deform_0_3

I would propose either including null all the time or omitting it all
the time, so that we would either change the JSON output to...

"Filter": {
  "Expr": "(lineitem.l_shipdate <= '1998-09-18 00:00:00'::timestamp
without time zone)",
  "JIT-Expr": "evalexpr_0_2",
  "JIT-Deform-Scan": "deform_0_3"
}

Or the text output to:

Filter: (lineitem.l_shipdate <= '1998-09-18 00:00:00'::timestamp
without time zone)
  JIT-Expr: evalexpr_0_2
  JIT-Deform-Scan: deform_0_3
  JIT-Deform-Outer: null
  JIT-Deform-Inner: null

You could argue that this is inconsistent because the JSON format
shows a bunch of keys that are essentially parallel, and this text
format makes the Expr key essentially the primary value and the others
secondary. But since the text format is for human beings, and since
human beings are likely to find the Expr key to be the primary piece
of information, maybe that's totally fine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: JIT performance bug/regression & JIT EXPLAIN

2020-01-27 Thread Maciek Sakrejda
On Mon, Jan 27, 2020 at 11:01 AM Robert Haas  wrote:
> On Fri, Nov 15, 2019 at 8:05 PM Maciek Sakrejda  wrote:
> > On Fri, Nov 15, 2019 at 5:49 AM Robert Haas  wrote:
> > > Personally, I don't care very much about backward-compatibility, or
> > > about how hard it is for tools to parse. I want it to be possible, but
> > > if it takes a little extra effort, so be it.
> >
> > I think these are two separate issues. I agree on
> > backward-compatibility (especially if we can embed a server version in
> > structured EXPLAIN output to make it easier for tools to track format
> > differences), but not caring how hard it is for tools to parse? What's
> > the point of structured formats, then?
>
> To make the data easy to parse. :-)
>
> I mean, it's clear that, on the one hand, having a format like JSON
> that, as has recently been pointed out elsewhere, is parsable by a
> wide variety of tools, is advantageous. However, I don't think it
> really matters whether the somebody's got to look at a tag called
> Flump and match it up with the data in another tag called JIT-Flump,
> or whether there's a Flump group that has RegularStuff and JIT tags
> inside of it. There's just not much difference in the effort involved.
> Being able to parse the JSON or XML using generic code is enough of a
> win that the details shouldn't matter that much.

Having a structured EXPLAIN schema that's semantically consistent is
still valuable. At the end of the day, it's humans who are writing the
tools that consume that structured output. Given the sparse structured
EXPLAIN schema documentation, as someone who currently works on
EXPLAIN tooling, I'd prefer a trend toward consistency at the expense
of backward compatibility. (Of course, we should avoid gratuitous
changes.)

But I take back the version number suggestion after reading Tom's
response; that was naïve.

> I think if you were going to complain about the limitations of our
> current EXPLAIN output format, it'd make a lot more sense to focus on
> the way we output expressions.

That would be nice to have, but for what it's worth, my main complaint
would be about documentation (especially around structured formats).
The "Using EXPLAIN" section covers the basics, but understanding what
node types exist, and what fields show up for what nodes and what they
mean--that seems to be a big missing piece (I don't feel entitled to
this documentation; as a structured format consumer, I'm just pointing
out a deficiency). Contrast that with the great wire protocol
documentation. In some ways it's easier to work on native drivers than
on EXPLAIN tooling because the docs are thorough and well organized.




Re: JIT performance bug/regression & JIT EXPLAIN

2020-01-27 Thread Tom Lane
Robert Haas  writes:
>>> I do not think that the readability-vs-usefulness tradeoff is going
>>> to be all that good there, anyway.  Certainly for testing purposes
>>> it's going to be more useful to examine portions of a structured output.

> I intensely dislike having information that we can't show in the text
> format, or really, that we can't show in every format.

Well, if it's relegated to a "jit = detail" option or some such,
the readability objection could be overcome.  But I'm still not clear
on how you'd physically wedge it into the output, at least not in a way
that matches up with the proposal that non-text modes handle this stuff
by producing sub-nodes for the existing types of expression fields.

regards, tom lane




Re: JIT performance bug/regression & JIT EXPLAIN

2020-01-27 Thread Robert Haas
On Mon, Jan 27, 2020 at 12:41 PM Andres Freund  wrote:
> > I do not think that the readability-vs-usefulness tradeoff is going
> > to be all that good there, anyway.  Certainly for testing purposes
> > it's going to be more useful to examine portions of a structured output.
>
> I think I can live with that, I don't think it's going to be a very
> commonly used option. It's basically useful for regression tests, JIT
> improvements, and people that want to see whether they can change their
> query / schema to make better use of JIT - the latter category won't be
> many, I think.

I intensely dislike having information that we can't show in the text
format, or really, that we can't show in every format.

I might be outvoted, but I stand by that position.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: JIT performance bug/regression & JIT EXPLAIN

2020-01-27 Thread Robert Haas
On Fri, Nov 15, 2019 at 8:05 PM Maciek Sakrejda  wrote:
> On Fri, Nov 15, 2019 at 5:49 AM Robert Haas  wrote:
> > Personally, I don't care very much about backward-compatibility, or
> > about how hard it is for tools to parse. I want it to be possible, but
> > if it takes a little extra effort, so be it.
>
> I think these are two separate issues. I agree on
> backward-compatibility (especially if we can embed a server version in
> structured EXPLAIN output to make it easier for tools to track format
> differences), but not caring how hard it is for tools to parse? What's
> the point of structured formats, then?

To make the data easy to parse. :-)

I mean, it's clear that, on the one hand, having a format like JSON
that, as has recently been pointed out elsewhere, is parsable by a
wide variety of tools, is advantageous. However, I don't think it
really matters whether the somebody's got to look at a tag called
Flump and match it up with the data in another tag called JIT-Flump,
or whether there's a Flump group that has RegularStuff and JIT tags
inside of it. There's just not much difference in the effort involved.
Being able to parse the JSON or XML using generic code is enough of a
win that the details shouldn't matter that much.

I think if you were going to complain about the limitations of our
current EXPLAIN output format, it'd make a lot more sense to focus on
the way we output expressions. If you want to mechanically parse one
of those expressions and figure out what it's doing - what functions
or operators are involved, and to what they are being applied - you
are probably out of luck altogether, and you are certainly not going
to have an easy time of it. I'm not saying we have to solve that
problem, but I believe it's a much bigger nuisance than the sort of
thing we are talking about here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: JIT performance bug/regression & JIT EXPLAIN

2020-01-27 Thread Andres Freund
Hi,

On 2020-01-27 12:15:53 -0500, Tom Lane wrote:
> Maciek Sakrejda  writes:
> > On Fri, Nov 15, 2019 at 5:49 AM Robert Haas  wrote:
> >> Personally, I don't care very much about backward-compatibility, or
> >> about how hard it is for tools to parse. I want it to be possible, but
> >> if it takes a little extra effort, so be it.
>
> > I think these are two separate issues. I agree on
> > backward-compatibility (especially if we can embed a server version in
> > structured EXPLAIN output to make it easier for tools to track format
> > differences), but not caring how hard it is for tools to parse? What's
> > the point of structured formats, then?
>
> I'd not been paying any attention to this thread, but Andres just
> referenced it in another discussion, so I went back and read it.
> Here's my two cents:
>
> * I agree with Robert that conditionally changing "Output" to "Project" is
> an absolutely horrid idea.

Yea, I think I'm convinced on that front. I never liked the idea, and
the opposition has been pretty unanimous...


> That will break every tool that looks at this stuff, and it just flies
> in the face of the design principle that the output schema should be
> stable, and it'll be a long term pain-in-the-rear for regression test
> back-patching, and it will confuse users much more than it will help
> them.  The other idea of suppressing "Output" in cases where no
> projection is happening might be all right, but only in text format
> where we don't worry about schema stability.  Another idea perhaps is
> to emit "Output: all columns" (in text formats, less sure what to do
> in structured formats).

I think I like the "all columns" idea. Not what I'd do on a green field,
but...

If we were just dealing with the XML format, we could just add a

True/False
to the current

   a
   b
   ...


and it'd make plenty sense. but for json's
"Output": ["a", "b"]
and yaml's
Output:
  - "a"
  - "b"
that's not an option as far as I can tell. Not sure what to do about
that.



> * In the structured formats, I think it should be okay to convert
> expression-ish fields from being raw strings to being {Expression}
> sub-nodes with the raw string as one field.  Aside from making it easy
> to inject JIT info, that would also open the door to someday showing
> expressions in some more-parse-able format than a string, since other
> representations could also be added as new fields.  (I have a vague
> recollection of wanting a list of all the Vars used in an expression,
> for example.)

Cool. Being extendable seems like a good direction. That's what I
primarily dislike about the various work-arounds for how to associate
information about JIT by a "related" name.

That'd e.g. open the door to have both a normalized and an original
expression in the explain output. Which would be quite valuable for
some monitoring tools.


> * Unfortunately that does nothing for the problem of how to show
> per-expression JIT info in text format.  Maybe we just shouldn't.
> I do not think that the readability-vs-usefulness tradeoff is going
> to be all that good there, anyway.  Certainly for testing purposes
> it's going to be more useful to examine portions of a structured output.

I think I can live with that, I don't think it's going to be a very
commonly used option. It's basically useful for regression tests, JIT
improvements, and people that want to see whether they can change their
query / schema to make better use of JIT - the latter category won't be
many, I think.

Since this is going to be a default off option anyway, I don't think
we'd need to be as concerned with compatibility. But even leaving
compatibility aside, it's not that clear how to best attach information
in the current text format, without being confusing.

Greetings,

Andres Freund




Re: JIT performance bug/regression & JIT EXPLAIN

2020-01-27 Thread Tom Lane
Maciek Sakrejda  writes:
> On Fri, Nov 15, 2019 at 5:49 AM Robert Haas  wrote:
>> Personally, I don't care very much about backward-compatibility, or
>> about how hard it is for tools to parse. I want it to be possible, but
>> if it takes a little extra effort, so be it.

> I think these are two separate issues. I agree on
> backward-compatibility (especially if we can embed a server version in
> structured EXPLAIN output to make it easier for tools to track format
> differences), but not caring how hard it is for tools to parse? What's
> the point of structured formats, then?

I'd not been paying any attention to this thread, but Andres just
referenced it in another discussion, so I went back and read it.
Here's my two cents:

* I agree with Robert that conditionally changing "Output" to "Project" is
an absolutely horrid idea.  That will break every tool that looks at this
stuff, and it just flies in the face of the design principle that the
output schema should be stable, and it'll be a long term pain-in-the-rear
for regression test back-patching, and it will confuse users much more than
it will help them.  The other idea of suppressing "Output" in cases where
no projection is happening might be all right, but only in text format
where we don't worry about schema stability.  Another idea perhaps is
to emit "Output: all columns" (in text formats, less sure what to do in
structured formats).

* In the structured formats, I think it should be okay to convert
expression-ish fields from being raw strings to being {Expression}
sub-nodes with the raw string as one field.  Aside from making it easy
to inject JIT info, that would also open the door to someday showing
expressions in some more-parse-able format than a string, since other
representations could also be added as new fields.  (I have a vague
recollection of wanting a list of all the Vars used in an expression,
for example.)

* Unfortunately that does nothing for the problem of how to show
per-expression JIT info in text format.  Maybe we just shouldn't.
I do not think that the readability-vs-usefulness tradeoff is going
to be all that good there, anyway.  Certainly for testing purposes
it's going to be more useful to examine portions of a structured output.

* I'm not on board with the idea of adding a version number to the
structured output formats.  In the first place, it's too late, since
we didn't leave room for one to begin with.  In the second, an overall
version number just isn't very helpful for this sort of problem.  If a
tool sees a version number higher than the latest thing it knows, what's
it supposed to do, just fail?  In practice it could still extract an awful
lot of info, so that really isn't a desirable answer.  It's better if the
data structure is such that a tool can understand that some sub-part of
the data is something it can't interpret, and just ignore that part.
(This is more or less the same design principle that PNG image format
was built on, FWIW.)  Adding on fields to an existing node type easily
meets that requirement, as does inventing new sub-node types, and that's
all that we've done so far.  But I think that replacing a scalar field
value with a sub-node probably works too (at least for well-written
tools), so the expression change suggested above should be OK.

regards, tom lane




Re: JIT performance bug/regression & JIT EXPLAIN

2019-11-15 Thread Maciek Sakrejda
On Fri, Nov 15, 2019 at 5:49 AM Robert Haas  wrote:
> Personally, I don't care very much about backward-compatibility, or
> about how hard it is for tools to parse. I want it to be possible, but
> if it takes a little extra effort, so be it.

I think these are two separate issues. I agree on
backward-compatibility (especially if we can embed a server version in
structured EXPLAIN output to make it easier for tools to track format
differences), but not caring how hard it is for tools to parse? What's
the point of structured formats, then?

> My main concern is having
> the text output look good to human beings, because that is the primary
> format and they are the primary consumers.

Structured output is also for human beings, albeit indirectly. That
text is the primary format may be more of a reflection of the
difficulty of building and integrating EXPLAIN tools than its inherent
superiority (that said, I'll concede it's a concise and elegant format
for what it does). What if psql supported an EXPLAINER like it does
EDITOR?

For what it's worth, after thinking about this a bit, I'd like to see
structured EXPLAIN evolve into a more consistent format, even if it
means breaking changes (and I do think a version specifier at the root
of the plan would make this easier).




Re: JIT performance bug/regression & JIT EXPLAIN

2019-11-15 Thread Robert Haas
On Wed, Nov 13, 2019 at 3:03 PM Andres Freund  wrote:
> Well, it's not been that way since the format option was added, so ...

It was pretty close in the original version, but people keep trying to
be clever.

> > I also think that conditionally renaming "Output" to "Project" is a
> > super-bad idea. The idea of a format like this is that the "keys" stay
> > constant and the values change. If you need to tell people more, you
> > add more keys.
>
> Yea, I don't like the compat break either.  But I'm not so convinced
> that just continuing to collect cruft because of compatibility is worth
> it - I just don't see an all that high reliance interest for explain
> output.
>
> I think adding a new key is somewhat ok for !text, but for text that
> doesn't seem like an easy solution?
>
> I kind of like my idea somewhere downthread, in a reply to Maciek, of
> simply not listing "Output" for nodes that don't project.  While that's
> still a format break, it seems that tools already need to deal with
> "Output" not being present?

Yes, I think leaving out Output for a node that doesn't Project would
be fine, as long as we're consistent about it.

> > But making it always be a group doesn't necessarily seem like a bad
> > idea. I think, though, that you could handle this in other ways, like
> > by suffixing existing keys.  e.g. if you've got Index-Qual and Filter,
> > just do Index-Qual-JIT and Filter-JIT and call it good.
>
> Maciek suggested the same. But to me it seems going down that way will
> make the format harder and harder to understand? So I think I'd rather
> break compat here, and go for a group.

Personally, I don't care very much about backward-compatibility, or
about how hard it is for tools to parse. I want it to be possible, but
if it takes a little extra effort, so be it. My main concern is having
the text output look good to human beings, because that is the primary
format and they are the primary consumers.

> Personally I think the group naming choice for explain makes the the
> !text outputs much less useful than they could be - we basically force
> every tool to understand all possible keys, to make sense of formatted
> output. Instead of something like 'Filter: {"Qual":{"text" : "...",
> "JIT": ...}' where a tool only needed to understand that everything that
> has a "Qual" inside is a filtering expression, everything that has a
> "Project" is a projecting type of expression, ... a tool needs to know
> about "Inner Cond", "Order By", "Filter", "Recheck Cond", "TID Cond",
> "Join Filter", "Merge Cond", "Hash Cond", "One-Time Filter", ...

It's not that long of a list, and I don't know of a tool that tries to
do something in particular with all of those types of things anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: JIT performance bug/regression & JIT EXPLAIN

2019-11-13 Thread Andres Freund
Hi,

On 2019-11-13 14:29:07 -0500, Robert Haas wrote:
> On Mon, Oct 28, 2019 at 7:21 PM Andres Freund  wrote:
> > Because that's the normal way to represent something non-existing for
> > formats like json? There's a lot of information we show always for !text
> > format, even if not really applicable to the context (e.g. Triggers for
> > select statements). I think there's an argument to made to deviate in
> > this case, but I don't think it's obvious.
> 
> I've consistently been of the view that anyone who thinks that the
> FORMAT option should affect what information gets displayed doesn't
> understand the meaning of the word "format." And I still feel that
> way.

Well, it's not been that way since the format option was added, so ...



> I also think that conditionally renaming "Output" to "Project" is a
> super-bad idea. The idea of a format like this is that the "keys" stay
> constant and the values change. If you need to tell people more, you
> add more keys.

Yea, I don't like the compat break either.  But I'm not so convinced
that just continuing to collect cruft because of compatibility is worth
it - I just don't see an all that high reliance interest for explain
output.

I think adding a new key is somewhat ok for !text, but for text that
doesn't seem like an easy solution?

I kind of like my idea somewhere downthread, in a reply to Maciek, of
simply not listing "Output" for nodes that don't project.  While that's
still a format break, it seems that tools already need to deal with
"Output" not being present?


> I also think that making the Filter field a group conditionally is a
> bad idea, for similar reasons.

Oh, yea, it's utterly terrible (I called it crappy in my email :)).


> But making it always be a group doesn't necessarily seem like a bad
> idea. I think, though, that you could handle this in other ways, like
> by suffixing existing keys.  e.g. if you've got Index-Qual and Filter,
> just do Index-Qual-JIT and Filter-JIT and call it good.

Maciek suggested the same. But to me it seems going down that way will
make the format harder and harder to understand? So I think I'd rather
break compat here, and go for a group.

Personally I think the group naming choice for explain makes the the
!text outputs much less useful than they could be - we basically force
every tool to understand all possible keys, to make sense of formatted
output. Instead of something like 'Filter: {"Qual":{"text" : "...",
"JIT": ...}' where a tool only needed to understand that everything that
has a "Qual" inside is a filtering expression, everything that has a
"Project" is a projecting type of expression, ... a tool needs to know
about "Inner Cond", "Order By", "Filter", "Recheck Cond", "TID Cond",
"Join Filter", "Merge Cond", "Hash Cond", "One-Time Filter", ...

Greetings,

Andres Freund




Re: JIT performance bug/regression & JIT EXPLAIN

2019-11-13 Thread Robert Haas
On Mon, Oct 28, 2019 at 7:21 PM Andres Freund  wrote:
> Because that's the normal way to represent something non-existing for
> formats like json? There's a lot of information we show always for !text
> format, even if not really applicable to the context (e.g. Triggers for
> select statements). I think there's an argument to made to deviate in
> this case, but I don't think it's obvious.

I've consistently been of the view that anyone who thinks that the
FORMAT option should affect what information gets displayed doesn't
understand the meaning of the word "format." And I still feel that
way.

I also think that conditionally renaming "Output" to "Project" is a
super-bad idea. The idea of a format like this is that the "keys" stay
constant and the values change. If you need to tell people more, you
add more keys.

I also think that making the Filter field a group conditionally is a
bad idea, for similar reasons. But making it always be a group doesn't
necessarily seem like a bad idea. I think, though, that you could
handle this in other ways, like by suffixing existing keys.  e.g. if
you've got Index-Qual and Filter, just do Index-Qual-JIT and
Filter-JIT and call it good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: JIT performance bug/regression & JIT EXPLAIN

2019-11-12 Thread Andres Freund
Hi,

On 2019-11-12 13:42:10 -0800, Maciek Sakrejda wrote:
> On Mon, Oct 28, 2019 at 5:02 PM Andres Freund  wrote:
> > What I dislike about that is that it basically again is introducing
> 
> "again"? Am I missing some history here? I'd love to read up on this
> if there are mistakes to learn from.

I think I was mostly referring to mistakes we've made for the json etc
key names. By e.g. having expressions as "Function Call", "Table
Function Call", "Filter", "TID Cond", ... a tool that wants to interpret
the output needs awareness of all of these different names, rather than
knowing that everything with a sub-group "Expression" has to be an
expression.

I.e. instead of

"Plan": {
  "Node Type": "Seq Scan",
  "Parallel Aware": false,
  "Relation Name": "pg_class",
  "Schema": "pg_catalog",
  "Alias": "pg_class",
  "Startup Cost": 0.00,
  "Total Cost": 17.82,
  "Plan Rows": 385,
  "Plan Width": 68,
  "Output": ["relname", "tableoid"],
  "Filter": "(pg_class.relname <> 'foo'::name)"
}

we ought to have gone for

"Plan": {
  "Node Type": "Seq Scan",
  "Parallel Aware": false,
  "Relation Name": "pg_class",
  "Schema": "pg_catalog",
  "Alias": "pg_class",
  "Startup Cost": 0.00,
  "Total Cost": 17.82,
  "Plan Rows": 385,
  "Plan Width": 68,
  "Output": ["relname", "tableoid"],
  "Filter": {"Expression" : { "text": (pg_class.relname <> 'foo'::name)"}}
}

or something like that. Which'd then make it obvious how to add
information about JIT to each expression.


Whereas the proposal of the separate key name perpetuates the
messiness...


> > something that requires either pattern matching on key names (i.e. a key
> > of '(.*) JIT' is one that has information about JIT, and the associated
> > expresssion is in key $1), or knowing all the potential keys an
> > expression could be in.
> 
> That still seems less awkward than having to handle a Filter field
> that's either scalar or a group.

Yea, it's a sucky option :(


> Most current EXPLAIN options just add
> additional fields to the structured plan instead of modifying it, no?
> If that output is better enough, though, maybe we should just always
> make Filter a group and go with the breaking change? If tooling
> authors need to treat this case specially anyway, might as well evolve
> the format.

Yea, maybe that's the right thing to do. Would be nice to have some more
input...

Greetings,

Andres Freund




Re: JIT performance bug/regression & JIT EXPLAIN

2019-11-12 Thread Maciek Sakrejda
On Mon, Oct 28, 2019 at 5:02 PM Andres Freund  wrote:
> What I dislike about that is that it basically again is introducing

"again"? Am I missing some history here? I'd love to read up on this
if there are mistakes to learn from.

> something that requires either pattern matching on key names (i.e. a key
> of '(.*) JIT' is one that has information about JIT, and the associated
> expresssion is in key $1), or knowing all the potential keys an
> expression could be in.

That still seems less awkward than having to handle a Filter field
that's either scalar or a group. Most current EXPLAIN options just add
additional fields to the structured plan instead of modifying it, no?
If that output is better enough, though, maybe we should just always
make Filter a group and go with the breaking change? If tooling
authors need to treat this case specially anyway, might as well evolve
the format.

> Another alternative would be to just remove the 'Output' line when a
> node doesn't project - it can't really carry meaning in those cases
> anyway?

¯\_(ツ)_/¯

For what it's worth, I certainly wouldn't miss it.




Re: JIT performance bug/regression & JIT EXPLAIN

2019-10-28 Thread Andres Freund
Hi,

On 2019-10-28 15:05:01 -0400, Robert Haas wrote:
> On Fri, Sep 27, 2019 at 3:21 AM Andres Freund  wrote:
> > - JIT-Expr: whether the expression was JIT compiled (might e.g. not be
> >   the case because no parent was provided)
> > - JIT-Deform-{Scan,Outer,Inner}: wether necessary, and whether JIT 
> > accelerated.
> >
> > I don't like these names much, but ...
> >
> > For the deform cases I chose to display
> > a) the function name if JIT compiled
> > b) "false" if the expression is JIT compiled, deforming is
> >necessary, but deforming is not JIT compiled (e.g. because the slot type
> >wasn't fixed)
> > c) "null" if not necessary, with that being omitted in text mode.
> 
> I mean, why not just omit in all modes if it's not necessary? I don't
> see that making the information we produce randomly inconsistent
> between modes is buying us anything.

Because that's the normal way to represent something non-existing for
formats like json? There's a lot of information we show always for !text
format, even if not really applicable to the context (e.g. Triggers for
select statements). I think there's an argument to made to deviate in
this case, but I don't think it's obvious.

Abstract formatting reasons aside, it's actually useful to see where we
know we're dealing with tuples that don't need to be deformed and thus
overhead due to that cannot be relevant. Not sure if there's sufficient
consumers for that, but ...  We e.g. should verify that the "none"
doesn't suddenly vanish, because we broke the information that let us
infer that we don't need tuple deforming - and that's easier to
understand if there's an explicit field, rather than reasining from
absence. IMO.

Greetings,

Andres Freund




Re: JIT performance bug/regression & JIT EXPLAIN

2019-10-28 Thread Robert Haas
On Fri, Sep 27, 2019 at 3:21 AM Andres Freund  wrote:
> - JIT-Expr: whether the expression was JIT compiled (might e.g. not be
>   the case because no parent was provided)
> - JIT-Deform-{Scan,Outer,Inner}: wether necessary, and whether JIT 
> accelerated.
>
> I don't like these names much, but ...
>
> For the deform cases I chose to display
> a) the function name if JIT compiled
> b) "false" if the expression is JIT compiled, deforming is
>necessary, but deforming is not JIT compiled (e.g. because the slot type
>wasn't fixed)
> c) "null" if not necessary, with that being omitted in text mode.

I mean, why not just omit in all modes if it's not necessary? I don't
see that making the information we produce randomly inconsistent
between modes is buying us anything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: JIT performance bug/regression & JIT EXPLAIN

2019-10-28 Thread Maciek Sakrejda
>But that's pretty crappy, because it means that the 'shape' of the output 
>depends on the jit_details option.

Yeah, that would be hard to work with. What about adding it as a sibling group?

"Filter": "(lineitem.l_shipdate <= '1998-09-18 00:00:00'::timestamp
without time zone)",
"Filter JIT": {
  "Expr": "evalexpr_0_2",
  "Deform Scan": "deform_0_3",
  "Deform Outer": null,
  "Deform Inner": null
}

Also not that pretty, but at least it's easier to work with (I also
changed the dashes to spaces since that's what the rest of EXPLAIN is
doing as a matter of style).

>But the compat break due to that change is not small- perhaps we could instead 
>mark that in another way?

We could add a "Projects" boolean key instead? Of course that's more
awkward in text mode. Maybe compat break is less of an issue in text
mode and we can treat this differently?

>I'm not sure that 'TRANS' is the best placeholder for the transition value 
>here. Maybe $TRANS would be clearer?

+1, I think the `$` makes it clearer that this is not a literal expression.

>For HashJoin/Hash I've added 'Outer Hash Key' and 'Hash Key' for each key, but 
>only in verbose mode.

That reads pretty well to me. What does the structured output look like?




Re: JIT performance bug/regression & JIT EXPLAIN

2019-09-29 Thread Andres Freund
Hi,

On 2019-09-27 00:20:53 -0700, Andres Freund wrote:
> Unfortunately I found a performance regression for JITed query
> compilation introduced in 12, compared to 11. Fixed in one of the
> attached patches 
> (v1-0009-Fix-determination-when-tuple-deforming-can-be-JIT.patch
> - which needs a better commit message).
> 
> The first question is when to push that fix. I'm inclined to just do so
> now - as we still do JITed tuple deforming in most cases, as well as
> doing so in 11 in the places this patch fixes, the risk of that seems
> low. But I can also see an arguments for waiting after 12.0.

Since nobody opined, I now have pushed that, and the other fix mentioned
later in that email.

I'd appreciate comments on the rest of the email, it's clear that we
need to improve the test infrastructure here. And also the explain
output for grouping sets...

Greetings,

Andres Freund