Il 10/05/2013 19:41, Anthony Liguori ha scritto: > Paolo Bonzini <pbonz...@redhat.com> writes: > >> Il 10/05/2013 16:39, Anthony Liguori ha scritto: >>> I just oppose the notion of disabling casts and *especially* only >>> disabling casts for official builds. >> >> This actually happens all the time. Exactly this kind of type-safe cast >> is disabled in releases of GCC, but enabled when building from svn trunk. > > Let's assume for a moment that you are right and this behavior is what > we should have. Let's also assume there is a real regression here > which has yet to have been established.
Aurelien timed the effect of my patch two hours before you sent this message. If it's not a regression in 1.5 (which is quite obvious from the profile), it is a regression from the introduction of CPU classes (1.3 or 1.4), when this code didn't exist at all. And in 1.5 we introduced virtio-net casts as well (or did mst sneak in his change anyway? ;)). If 10% is the effect of a few hundred interrupts/sec, perhaps the same effect is visible on a few thousand packets/sec. I wouldn't bet against that one week from release. (In fact, I'm not going to bet against that after release, either. I'll propose to disable QOM cast checking in Fedora and RHEL). > As soon as we open up 1.6 for development, we face an immediate > regression in performance. So we need to fix the real problem here > anyway. No, we don't. We will simply have to live with a debugging vs. production tradeoff. You will need to remember using the appropriate option when doing performance tests on development releases. We can print a message if necessary, but it's really common practice. What about lockdep or might_sleep debugging or similar checks that Fedora only enables for the kernel during development phases? > I strongly suspect that if there is a problem, that optimizing > leaf/concrete casts will take care of it. Otherwise, a small lookup > cache ought to do the trick too. We can even use pointer comparisons > for the lookup cache which should make it extremely cheap. Still more expensive than no code at all. > Let's independently evaluate your proposal for 1.6. No, my proposal has to be evaluated for 1.5, and perhaps reverted later. We have a potentially large *set* of regressions (large amounts of QOM casts were introduced in 1.5) that we found after -rc1, and the big hammer is the only way to deal with it. > Of course, these changes never make it into the tree which is an > indication that the mechanism works very well :-) Yes, it works *great*. I don't want to give the impression that I think the opposite. But how often have you seen them abort in the distro QEMU, as opposed to a development version? And how often have you seen "CRITICAL: foo != NULL" or "CRITICAL: GTK_IS_WIDGET (foo)" from a GTK+ app? For me, it is never vs. quite often. It works great simply because it's very cheap to add and makes debugging much nicer. > Note that the cast macros are a big improvement in code readability. > The only real replacement would be static casts which would downgrade > safety. Yes, we should keep the cast macros, no doubt about that. And the checks, for development. But adding tracing, while removing the actual checks, is a pretty good speed compromise IMHO. And leaving the checks in before the freeze matches what most developers probably desire, and avoids bitrotting of the check code. > If you want to debate the merits of the safety, that's fine. > >> Type-safe casts make sense in GTK+/GObject where: 1) type-safe casts >> return NULL and log a "critical" error, they do not abort; 2) all >> functions fail with another "critical" error if they are passed NULL. >> We do neither, so we're just trading a crash now for a crash very soon >> after (our call stacks tend to be relatively shallow, with some >> exceptions such as the block layer). > > The big assumption here is that dereference a NULL results in > predictable failure. This is not always the case and can lead to > security vulnerabilities. That's not what I was talking about. I was talking about code like this: void gtk_window_set_wmclass (GtkWindow *window, const gchar *wmclass_name, const gchar *wmclass_class) { GtkWindowPrivate *priv; g_return_if_fail (GTK_IS_WINDOW (window)); priv = window->priv; ... } that avoids NULL pointer dereferences before they happen, while still doing the checks. Anyway, yes: the checks can catch some security vulnerabilities. But they won't catch many uses-after-free, they won't catch wrong refcounting, they won't catch the _actual_ vulnerabilities that we had, nor probably the ones that we could have. Every segfault we fix could potentially become a guest->host exploit with enough skill, the guest has control of an enormous part of the QEMU address space. But we fix segfaults, not QOM cast failures. Before QOM casts, I don't remember seeing many such failures _in the tree_. They of course happen during development, but QOM casts really help before patches are posted. > If you are concerned about the performance, provide a concrete example > of poor performance and I will fix it. > > If we find one that is unfixable, then we should consider your > proposal. Nothing is unfixable. Worst of all, you can just stick static C casts in the hot paths. But that's what I want to avoid, I want my brain to concentrate on the code on the hot paths, not on pointless differences between those parts and the rest of a device. Paolo