Re: [HACKERS] pgbench regression test failure

Fabien COELHO Mon, 20 Nov 2017 00:46:33 -0800


Hello Tom,


Thanks for having a look at this bug fix.

So we fixed the reported TPS rate, which was nowhere near reality,
and the per-script stats are sane now.  Good so far, but this
still has two problems IMO:

1. The per-script stats shouldn't be printed at all if there's
only one script.  They're redundant with the overall stats.


Indeed.

I think the output should tend to be the same for possible automaticprocessing, whether there is one script or more, even at the price of someredundancy.


Obviously this is highly debatable.

2. ISTM that we should report that 100% of the transactions were
above the latency limit, not 33%; that is, the appropriate base
for the "number of transactions above the latency limit" percentage
is the number of actual transactions not the number of scheduled
transactions.


Hmmm. Allow me to disagree.

If the user asked for something, then this is the reference to take andthe whole report should be relative to that. Also, changing this makes thereport figures not adding up:


before:
  number of transactions skipped: 417 (90.260 %)
  number of transactions above the 1.0 ms latency limit: 45 (9.740 %)

Both above percent are on the same reference, fine. Can be added, 100% oftransactions we required had issues.


after:
  number of transactions skipped: 457 (90.855 %)
  number of transactions above the 1.0 ms latency limit: 46 (100.000 %)

Percents do not add up anymore. I do not think that this is animprovement.

I also noticed that if I specify "-f sleep-100.sql" more than once,
the per-script TPS reports are out of line.  This is evidently because
that calculation isn't excluding skipped xacts; but if we're going to
define tps as excluding skipped xacts, surely we should do so there too.


I do not think that we should exclude skipped xacts.

I'm also still exceedingly unhappy about the NaN business.
You have this comment in printSimpleStats:
        /* print NaN if no transactions where executed */
but I find that unduly optimistic.  It should really read more like
"if no transactions were executed, at best we'll get some platform-
dependent spelling of NaN.  At worst we'll get a SIGFPE."

Hmmm. Alas you must be right about spelling. There has been no report ofSIGFPE issue, so I would not bother with that.

I do not think that a pedantic argument about NaN being the "correct"answer justifies taking such portability risks, even if I bought thatargument, which I don't particularly.

ISTM that the portability risk, i.e. about SIGFPE issue, would require anon IEEE 754 conforming box, and there has been no complaint yet aboutthat.

It's also inconsistent with this basic decision in printResults:

        /* Remaining stats are nonsensical if we failed to execute any xacts */
        if (total->cnt <= 0)
                return;

According to praise, this decision was by Tom Lane. It is about the wholeexecution not executing anything, which may be seen as a special errorcase, whereas a progress report without xact execution within some periodcan be perfectly normal.

If we thought NaN was the "correct" answer for no xacts then we'd just
bull ahead and print NaNs all over the place.


Why not, but for me the issue wrt to the final report is distinct.

I think we would be best advised to deal with this by printing zeroes inthe progress-report case, and by skipping the output altogether inprintSimpleStats. (Or we could make printSimpleStats print zeroes; I'mindifferent to that choice.)

I disagree with printing a zero if the measure is undefined. Zero does notmean undefined. "Zero means zero":-)

Maybe we could replace platform-specific printing for "NaN" with somethingelse, either a deterministic "NaN" or some other string, eg "-" or "?" orwhatever. The benefit with sticking to some "NaN" or some platformspecific NaN is that processing the output can retrieve it as a float.

So basically I would have stayed with the current version which seemsconsistent to me, but I agree that it is debatable. I'm not sure it isworth being "exceedingly unhappy" about it, because these are specialcorner cases anyway, not likely to be seen without somehow looking for itor having pretty strange pgbench runs.

Attached is an updated patch addressing these points.  Comments?


I somehow disagree with your points above, for various reasons.

However, you are a committer, and I think best that the bug is fixed evenif I happen to disagree with unrelated changes. As an academic, I probablyhave a number of silly ideas about a lot of things. This is part of thejob description:-)


--
Fabien.

Re: [HACKERS] pgbench regression test failure

Reply via email to