Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#3] QEMU 5.0 and 5.1-pre-soft-freeze Dissect Comparison

2020-07-10 Thread Philippe Mathieu-Daudé
On 7/10/20 11:20 AM, Ahmed Karaman wrote:
> On Thu, Jul 9, 2020 at 4:41 PM Alex Bennée  wrote:
>>
>> If you identify a drop in performance due to a commit linking to it from
>> the report wouldn't be a bad idea so those that want to quickly
>> replicate the test can do before/after runs.
>>
> 
> Report number 5 will introduce a new tool for detecting commits
> causing performance improvements and degradations. The report will
> utilize this tool to find out the specific commit introducing these
> changes.

Great news! Looking forward to test/use it :)

>>>
>>> Previous reports:
>>> Report 1 - Measuring Basic Performance Metrics of QEMU:
>>> https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
>>> Report 2 - Dissecting QEMU Into Three Main Parts:
>>> https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg09441.html
>>>
>>> Best regards,
>>> Ahmed Karaman



Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#3] QEMU 5.0 and 5.1-pre-soft-freeze Dissect Comparison

2020-07-10 Thread Alex Bennée


Ahmed Karaman  writes:

> On Thu, Jul 9, 2020 at 4:41 PM Alex Bennée  wrote:
>>
>>
>> Ahmed Karaman  writes:
>>
>> > Hi,
>> >
>> > The third report of the TCG Continuous Benchmarking series utilizes
>> > the tools presented in the previous report for comparing the
>> > performance of 17 different targets across two versions of QEMU. The
>> > two versions addressed are 5.0 and 5.1-pre-soft-freeze (current state
>> > of QEMU).
>> >
>> > After summarizing the results, the report utilizes the KCachegrind
>> > tool and dives into the analysis of why all three PowerPC targets
>> > (ppc, ppc64, ppc64le) had a performance degradation between the two
>> > QEMU versions.
>>
>> It's an interesting degradation especially as you would think that a
>> change in the softfloat implementation should hit everyone in the same
>> way.
>>
>
> That's the same that I've thought of, but while working on next week's
> report, it appears that this specific change introduced a performance
> improvement in other targets!
>
>> We actually have a tool for benchmarking the softfloat implementation
>> itself called fp-bench. You can find it in tests/fp. I would be curious
>> to see if you saw a drop in performance in the following:
>>
>>   ./fp-bench -p double -o cmp
>>
>
> I ran the command before and after the commit introducing the
> degradation. Both runs gave results varying between 600~605 MFlops.
> Running with Callgrind and the Coulomb benchmark, the results were:
> Number of instructions before: 12,715,390,413
> Number of isntructions after: 13,031,104,137

You may have to average over several runs to see if there is a
detectable change. It could be although there are more instructions
being executed it makes no practical difference to the execution because
the processor is just as efficient in scheduling the work to the
execution units.

You have to remember on modern processors the relationship between
instructions and the utilisation of the eventual ALUs is tenuous at
best. After everything has been converted to uOps and scheduled you
might be doing broadly the same calculations. Pipeline and cache stalls
are probably a more important metric here although I doubt figure much
in the very tight loop of the benchmark.

>
>> >
>> > Report link:
>> > https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/QEMU-5.0-and-5.1-pre-soft-freeze-Dissect-Comparison/
>>
>> If you identify a drop in performance due to a commit linking to it from
>> the report wouldn't be a bad idea so those that want to quickly
>> replicate the test can do before/after runs.
>>
>
> Report number 5 will introduce a new tool for detecting commits
> causing performance improvements and degradations. The report will
> utilize this tool to find out the specific commit introducing these
> changes.

Excellent - keep up the good work ;-)

>
>> >
>> > Previous reports:
>> > Report 1 - Measuring Basic Performance Metrics of QEMU:
>> > https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
>> > Report 2 - Dissecting QEMU Into Three Main Parts:
>> > https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg09441.html
>> >
>> > Best regards,
>> > Ahmed Karaman
>>
>>
>> --
>> Alex Bennée
>
> Best regards,
> Ahmed Karaman


-- 
Alex Bennée



Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#3] QEMU 5.0 and 5.1-pre-soft-freeze Dissect Comparison

2020-07-10 Thread Ahmed Karaman
On Thu, Jul 9, 2020 at 4:41 PM Alex Bennée  wrote:
>
>
> Ahmed Karaman  writes:
>
> > Hi,
> >
> > The third report of the TCG Continuous Benchmarking series utilizes
> > the tools presented in the previous report for comparing the
> > performance of 17 different targets across two versions of QEMU. The
> > two versions addressed are 5.0 and 5.1-pre-soft-freeze (current state
> > of QEMU).
> >
> > After summarizing the results, the report utilizes the KCachegrind
> > tool and dives into the analysis of why all three PowerPC targets
> > (ppc, ppc64, ppc64le) had a performance degradation between the two
> > QEMU versions.
>
> It's an interesting degradation especially as you would think that a
> change in the softfloat implementation should hit everyone in the same
> way.
>

That's the same that I've thought of, but while working on next week's
report, it appears that this specific change introduced a performance
improvement in other targets!

> We actually have a tool for benchmarking the softfloat implementation
> itself called fp-bench. You can find it in tests/fp. I would be curious
> to see if you saw a drop in performance in the following:
>
>   ./fp-bench -p double -o cmp
>

I ran the command before and after the commit introducing the
degradation. Both runs gave results varying between 600~605 MFlops.
Running with Callgrind and the Coulomb benchmark, the results were:
Number of instructions before: 12,715,390,413
Number of isntructions after: 13,031,104,137

> >
> > Report link:
> > https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/QEMU-5.0-and-5.1-pre-soft-freeze-Dissect-Comparison/
>
> If you identify a drop in performance due to a commit linking to it from
> the report wouldn't be a bad idea so those that want to quickly
> replicate the test can do before/after runs.
>

Report number 5 will introduce a new tool for detecting commits
causing performance improvements and degradations. The report will
utilize this tool to find out the specific commit introducing these
changes.

> >
> > Previous reports:
> > Report 1 - Measuring Basic Performance Metrics of QEMU:
> > https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
> > Report 2 - Dissecting QEMU Into Three Main Parts:
> > https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg09441.html
> >
> > Best regards,
> > Ahmed Karaman
>
>
> --
> Alex Bennée

Best regards,
Ahmed Karaman



Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#3] QEMU 5.0 and 5.1-pre-soft-freeze Dissect Comparison

2020-07-09 Thread Alex Bennée


Ahmed Karaman  writes:

> Hi,
>
> The third report of the TCG Continuous Benchmarking series utilizes
> the tools presented in the previous report for comparing the
> performance of 17 different targets across two versions of QEMU. The
> two versions addressed are 5.0 and 5.1-pre-soft-freeze (current state
> of QEMU).
>
> After summarizing the results, the report utilizes the KCachegrind
> tool and dives into the analysis of why all three PowerPC targets
> (ppc, ppc64, ppc64le) had a performance degradation between the two
> QEMU versions.

It's an interesting degradation especially as you would think that a
change in the softfloat implementation should hit everyone in the same
way.

We actually have a tool for benchmarking the softfloat implementation
itself called fp-bench. You can find it in tests/fp. I would be curious
to see if you saw a drop in performance in the following:

  ./fp-bench -p double -o cmp

>
> Report link:
> https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/QEMU-5.0-and-5.1-pre-soft-freeze-Dissect-Comparison/

If you identify a drop in performance due to a commit linking to it from
the report wouldn't be a bad idea so those that want to quickly
replicate the test can do before/after runs.

>
> Previous reports:
> Report 1 - Measuring Basic Performance Metrics of QEMU:
> https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
> Report 2 - Dissecting QEMU Into Three Main Parts:
> https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg09441.html
>
> Best regards,
> Ahmed Karaman


-- 
Alex Bennée