Quick initial high-level feedback, since I'm afraid real review will take a while (series is long, and I'm still swamped).
Peter Crosthwaite <crosthwaitepe...@gmail.com> writes: > Hi Markus and all, > > This patch series adds support for automatically concatenating multiple > errors to the one Error *. > > I'll start with what I am actually trying to do, which is get rid of the > boilerplate: > > some_api_call(... , &local_err); > if (local_err) { > error_propagate(errp, local_err); > return; > } > some_similar_api_call(... , &local_err); > if (local_err) { > error_propagate(errp, local_err); > return; > } > some_similar_api_call(... , &local_err); > if (local_err) { > error_propagate(errp, local_err); > return; > } > ... > > It is usually 1 LOC for the API and 4 LOC for the error handling boiler > plate making our code hard reading. This is particularly bad in hw/arm > where we have a good number of fully QOMified SoCs and machine models, > each of which need these checks on recursive realize calls and friends. The problem is real. A way to avoid or even reduce the boiler plate would be nice, provided it's reasonably hard to misuse, and doesn't mess with the current usage patterns of the Error interface. > The removed LOC just in ARM pays for the extra core code needed to > implement this. And the number of ARM machines is only going to grow. > > So the plan is: > > 1: Allow an Error * to contain more that one actual error from API > calls. Can you explain *why* you want to support multiple errors? > 2: Refactor key APIs (some_similar_api_call() in the above example) > to not fatal when previous errors have occured in dependencies. I'm afraid this is going to be non-trivial :) For every place that now continues after errors, you need to show that continuing is safe. > Point 1 kind of got big on me. Patch 4 is the main event, listifying > errors. The follow up issue however, is it tricky to get a sane > definition of error_get_pretty for a multi-error. So instead the > strategy is to remove error_get_pretty() and replace with some error > API helper with well defined behaviour for multi-error. The two leading > uses of error get pretty are prefixing an error, and reporting an error > via a custom printf handler. So two new APIs are added for that (P5-6). > There aren't many error_get_pretty's left after that, and they > shouldn't be in the path of any multi-errors. > > I think the error_prefix is valuable it its own right, as it now means > the code for report or propagating a prefixed error is now consistent > with the non-prefixed variants. > > That is, we used to have: > > /* If we are prefixing we have to do it this way: */ > error_setg(errp, "some prefix %s", error_get_pretty(local_err)); > error_free(local_err); > > vs: > > /* but if not prefixing it is like this: */ > error_propagate(errp, local_err); > > Now with this patch series the two are much more recognisable as the same > with: > > /* This code is almost the same as the above non-prefixed propagation */ > error_prefix(local_err, "some prefix"): > error_propagate(errp, local_err); Less flexible, but that might be a good thing. Need to see the actual uses to tell. > Point 2 is less about error API and more about machine generation. > Sysbus, Memory and Qdev all have APIs that depend on successful device- > init and realize calls for argument devices. As we are trying to remove > the error detection for those argument devs, those APIs need to tweaked > to handle realize failure. This actually wasn't as bad as I thought it > would be. See patches (7-9). > > All patches after that walk the various major subsystems converting > error APIs to this new scheme and deleting now-unneeded boiler plate. > ARM is first (P10-15) seeing good clean up of propagate handers. > > So the net result for these ARM machines, is error behaviour that is > something like a compiler. If any one thing fails, then machine-init > (compilation) fails. But an early fail does not stop machine-init > (compilation), instead it proceeds to the end collecting subsequent > errors as it goes. Simple compilers stop on first error. Not as bad as it may sound when your machine gets from running "make" to compiler dying on the first error real fast. More ambitious compilers continue to diagnose more errors. This isn't trivial. The compiler has to satisfy post conditions even after an error, typically by synthesizing suitable error values. It has to take pains to avoid error cascades. Experienced users recognize when that effort fails, and typically ignore the remaining errors wholesale then. In QEMU, error cascades might be less of a problem than with compilers. To tell for sure, we'd have to try. However, satisfying post conditions is at least as much of a problem. More so since they're generally unstated. Can you explain your strategy for solving this one? > Some other interesting food for thought is the qemu_fdt APIs which I > have been wanting to convert to error API but the local_err propagation > is going to be brutal in heavy users like e500.c. This would solve that > as fdt API could be easily made multi-error safe and clients like e500 > can just collect multi-errors and single-fail at the end. > > Long term, we can use this to catch cases of multiple genuine machine > init errors in the one boot but that is a secondary goal to simply > cutting down on code boilerplate. The best feature of this series is > the diffstat. > > Patches 1-3 are cleanup that can be taken independent of the series. > > I think P3 may be obsolete from a recent merge, but i'll wait > for architectural feedback before rebasing. Sorry, only very high level feedback so far. Best I can do right now. Hope it's useful anyway.