Re: [Qemu-devel] [PATCH for-1.5 0/9] Disable expensive QOM cast debugging for official releases

Paolo Bonzini Fri, 10 May 2013 14:00:10 -0700

Il 10/05/2013 19:41, Anthony Liguori ha scritto:
> Paolo Bonzini <pbonz...@redhat.com> writes:
> 
>> Il 10/05/2013 16:39, Anthony Liguori ha scritto:
>>> I just oppose the notion of disabling casts and *especially* only
>>> disabling casts for official builds.
>>
>> This actually happens all the time.  Exactly this kind of type-safe cast
>> is disabled in releases of GCC, but enabled when building from svn trunk.
> 
> Let's assume for a moment that you are right and this behavior is what
> we should have.  Let's also assume there is a real regression here
> which has yet to have been established.


Aurelien timed the effect of my patch two hours before you sent this
message.  If it's not a regression in 1.5 (which is quite obvious from
the profile), it is a regression from the introduction of CPU classes
(1.3 or 1.4), when this code didn't exist at all.

And in 1.5 we introduced virtio-net casts as well (or did mst sneak in
his change anyway? ;)). If 10% is the effect of a few hundred
interrupts/sec, perhaps the same effect is visible on a few thousand
packets/sec.  I wouldn't bet against that one week from release.

(In fact, I'm not going to bet against that after release, either.  I'll
propose to disable QOM cast checking in Fedora and RHEL).

> As soon as we open up 1.6 for development, we face an immediate
> regression in performance.  So we need to fix the real problem here
> anyway.

No, we don't.  We will simply have to live with a debugging vs.
production tradeoff.  You will need to remember using the appropriate
option when doing performance tests on development releases.  We can
print a message if necessary, but it's really common practice.  What
about lockdep or might_sleep debugging or similar checks that Fedora
only enables for the kernel during development phases?

> I strongly suspect that if there is a problem, that optimizing
> leaf/concrete casts will take care of it.  Otherwise, a small lookup
> cache ought to do the trick too.  We can even use pointer comparisons
> for the lookup cache which should make it extremely cheap.

Still more expensive than no code at all.

> Let's independently evaluate your proposal for 1.6.

No, my proposal has to be evaluated for 1.5, and perhaps reverted later.
 We have a potentially large *set* of regressions (large amounts of QOM
casts were introduced in 1.5) that we found after -rc1, and the big
hammer is the only way to deal with it.

> Of course, these changes never make it into the tree which is an
> indication that the mechanism works very well :-)

Yes, it works *great*.  I don't want to give the impression that I think
the opposite.  But how often have you seen them abort in the distro
QEMU, as opposed to a development version?  And how often have you seen
"CRITICAL: foo != NULL" or "CRITICAL: GTK_IS_WIDGET (foo)" from a GTK+
app?  For me, it is never vs. quite often.

It works great simply because it's very cheap to add and makes debugging
much nicer.

> Note that the cast macros are a big improvement in code readability.
> The only real replacement would be static casts which would downgrade
> safety.

Yes, we should keep the cast macros, no doubt about that.  And the
checks, for development.  But adding tracing, while removing the actual
checks, is a pretty good speed compromise IMHO.  And leaving the checks
in before the freeze matches what most developers probably desire, and
avoids bitrotting of the check code.

> If you want to debate the merits of the safety, that's fine.
> 
>> Type-safe casts make sense in GTK+/GObject where: 1) type-safe casts
>> return NULL and log a "critical" error, they do not abort;  2) all
>> functions fail with another "critical" error if they are passed NULL.
>> We do neither, so we're just trading a crash now for a crash very soon
>> after (our call stacks tend to be relatively shallow, with some
>> exceptions such as the block layer).
> 
> The big assumption here is that dereference a NULL results in
> predictable failure.  This is not always the case and can lead to
> security vulnerabilities.

That's not what I was talking about.  I was talking about code like this:

void
gtk_window_set_wmclass (GtkWindow *window,
                        const gchar *wmclass_name,
                        const gchar *wmclass_class)
{
  GtkWindowPrivate *priv;

  g_return_if_fail (GTK_IS_WINDOW (window));
  priv = window->priv;

  ...
}

that avoids NULL pointer dereferences before they happen, while still
doing the checks.

Anyway, yes: the checks can catch some security vulnerabilities.  But
they won't catch many uses-after-free, they won't catch wrong
refcounting, they won't catch the _actual_ vulnerabilities that we had,
nor probably the ones that we could have.  Every segfault we fix could
potentially become a guest->host exploit with enough skill, the guest
has control of an enormous part of the QEMU address space.  But we fix
segfaults, not QOM cast failures.

Before QOM casts, I don't remember seeing many such failures _in the
tree_.  They of course happen during development, but QOM casts really
help before patches are posted.

> If you are concerned about the performance, provide a concrete example
> of poor performance and I will fix it.
> 
> If we find one that is unfixable, then we should consider your
> proposal.

Nothing is unfixable.  Worst of all, you can just stick static C casts
in the hot paths.  But that's what I want to avoid, I want my brain to
concentrate on the code on the hot paths, not on pointless differences
between those parts and the rest of a device.

Paolo

Re: [Qemu-devel] [PATCH for-1.5 0/9] Disable expensive QOM cast debugging for official releases

Reply via email to