* Markus Armbruster (arm...@redhat.com) wrote: > "Dr. David Alan Gilbert" <dgilb...@redhat.com> writes: > > > * Eric Blake (ebl...@redhat.com) wrote: > >> On 05/03/2016 06:26 AM, Markus Armbruster wrote: > >> > >> >>> + visit_type_int(vmdesc, "size", &size, &error_abort); > >> >>> + visit_start_list(vmdesc, "fields", NULL, 0, &error_abort); > >> >>> + visit_start_struct(vmdesc, NULL, NULL, 0, &error_abort); > >> >> > >> >> Please avoid error_abort in migration code, especially on the source > >> >> side. > >> >> You've got an apparently happily working VM, we must never kill it > >> >> while attempting migration. > >> > > >> > These functions cannot fail, and &error_abort is a concise way to > >> > express that. It's the same as > >> > > >> > visit_type_int(vmdesc, "size", &size, &err); > >> > assert(!err); > >> > >> &error_abort is ONLY supposed to be used to flag programming errors (ie. > >> they should never be reachable). I'm asserting that the errors don't > >> happen, and therefore this cannot make the migration fail - in other > >> words, this is NOT going to kill a VM that attempts migration. > > > > OK, but remember that I work on the basis that there are programming errors > > in both the migration code and the VMState descriptions for devices. > > If those break it still shouldn't kill the source. > > (Note this isn't just true of migration - we need to be careful about > > it in all cases where we're doing stuff to an otherwise happy VM). > > While you can safely recover from certain programming errors, you can't > do it in general. Worse, deciding whether recovery from a certain > programming error is safe can be intractable. > > Example: visit_type_enum(v, name, &enum_val, enum_str, &err), where v is > an output visitor. This can fail when enum_val is not a valid subscript > of enum_str[]. Can we recover safely? Assume that we can cleanly fail > the task at hand at this point of its execution. > > Perhaps enum_str[] doesn't match the actual enum. This is a programming > error. Failing the task is graceful degradation, and safe enough. > > But what if enum_str[] is fine, but enum_val got corrupted? Then > failing the task is still safe as long as enum_val isn't visible outside > the task. But if it is visible, all bets are off. The corruption can > spread, and do real damage. Can be the difference between a crash that > forces a reboot with a filesystem journal replay, and massive data > corruption. > > So, should we try to recover here? Assuming we want to, badly. If > analysis shows the possible causes of this error are safely isolated by > the recovery, yes. Without such analysis, the only prudent answer is > no. > > Real world examples typically deal with state more complex than just an > enum (all too often a thicket of pointers), and the safety argument gets > much hairier. > > If you want more tractable arguments, try Erlang.
And so my argument here is very simple; if we believe we have a corruption in migration data then we fail migration - I don't try and do anything clever about trying to bound what's broken. This isn't about getting formal/tractable arguments, it's about making a practical system. > >> > * Conditions where the JSON output visitor itself sets an error: > >> > > >> > - None. > >> > >> The JSON output visitor itself may be adding an error for an attempt to > >> output Inf or NaN for a floating point number - but since vmstate > >> doesn't use visit_type_number(), this is not possible. And if we are > >> really worried about it, then in my next spin of the patch I may make it > >> user-configurable whether we stick to strict JSON or whether we relax > >> things and output Inf/NaN anyways. > > > > If that's the only case, and you're already saying it doesn't use it, then > > I don't see there's a point in making that bit any more configurable. > > I listed all possible failures of the JSON output visitor upthread. > This is an additional failure we've considered. I'm wary of adding it > precisely because I do worry about upsetting apple carts like this one. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK