On 05/03/2011 07:51 AM, Jes Sorensen wrote:
On 04/21/11 15:55, Michael Roth wrote:
Did you do anything with the fsfreeze patches, or were they dropped in
the migration to qapi?
They were pending some changes required on the agent side that weren't
really addressed/doable until this patchset, namely:
1) server-side timeout mechanism to recover from RPCs that can hang
indefinitely or take a really long time (fsfreeze, fopen, etc),
currently it's 30 seconds, may need to bump it to 60 for fsfreeze, or
potentially add an RPC to change the server-side timeout
2) a simple way to temporarily turn off logging so agent doesn't
deadlock itself
3) a way to register a cleanup handler when a timeout occurs.
4) disabling RPCs where proper accounting/logging is required
(guest-open-file, guest-shutdown, etc)
#4 isn't implemented...I think this could be done fairly in-evasively
with something like:
Response important_rpc():
if (!ga_log("syslog", LEVEL_CRITICAL, "important stuff happening"))
return ERROR_LOGGING_CURRENTLY_DISABLED
Either that, or maybe simply disable the full command while the freeze
is in progress? I fear we're more likely to miss a case of checking for
logging than we are to miss command disabling?
It should still be very non evasive, maybe just a flag in the struct
declaring the functions marking it as logging-required and if the
no-logging flag is set, the command is made to wait, or return -EAGAIN
Yup when I actually starting dropping it in I realized this was a much
better approach. Although, for now I just added something like "if
(!logging_enabled) { error_set(QERR_GA_LOGGING_DISABLED); return }" to
the start of functions where logging is considered critical, which will
result in the user getting an error message about logging so it's not
too much of a surprise to them.
The actual dispatch code closely mirrors Anthony's dispatch stuff for
QMP so I was hesitant to try to modify it to handle this automatically,
since it would require some changes to how the schema parsing/handling
is done (would probably need to add a "requires_logging" flag in the
schema). Wouldn't take much though. Either way, should be a clean
conversion if we decide to go that route.
bool ga_log(log_domain, level, msg):
if (log_domain == "syslog")
if (!logging_enabled&& is_critical(log_level))
return False;
syslog(msg, ...)
else
if (logging_enabled)
normallog(msg, ...)
return True
With that I think we could actually drop the fsfreeze stuff in. Thoughts?
IMHO it is better to disable the commands rather than just logging, but
either way should allow it to drop in.
Kinda agree, but logging seems to be the real dependency. With the
server-side timeouts now in place even doing stuff like fopen/fwrite is
permitted (it would just timeout if it blocked too long). It's the
logging stuff that we don't really have a way to recover from, because
it's not run in a thread we can just nuke after a certain amount of time.
Even when we're not frozen, we can't guarantee an fopen/fwrite/fread
will succeed, so failures shouldn't be too much of a surprise since they
need to be handled anyway. And determining whether or not a command
should be marked as executable during a freeze is somewhat nebulous
(fopen might work for read-only access, but hang for write access when
O_CREATE is set, fwrite might succeed if it doesn't require a flush,
etc), plus internal things like logging need to be taken into account.
So, for now at least I think it's a reasonable way to do it.
Sorry for the late reply, been a bit swamped here.
No problem I have your patches in my tree now. They still need a little
bit of love and testing but I should be able to get them out on the list
shortly.
Cheers,
Jes