[caiman-discuss] [pkg-discuss] cross-project review: Draft BE Error observability design

[email protected] Thu, 20 Aug 2009 16:08:29 -0700

On Thu, Aug 20, 2009 at 04:31:21PM -0600, Evan Layton wrote:
>> In my high-level comments, I observed that this structure doesn't allow
>> libbe to grow or add any other error information beyond what's provided
>> here.  At a minimum, there should be a field, or linked list, that allow
>> callers in the error handling chain to add ancillary information as an
>> operation progresses.  The initial failure may be that a device was in
>> use, but as the error is propagated up the call chain, there may be
>> other failures that occur too.  How do you plan to handle this case?
>> Augmenting the error object with other information is one approach, but
>> there are certainly others.
>
> Anything that happens after this failure it not something we necessarily 
> want to pass back back through to a user. We don't want to give them more 
> information than they really need and should stop at the first error that 
> caused us to fail. This was discussed with Frank and it was his 
> suggestion that we don't provide too much information and confuse the 
> user.


I'm not sure this adresses my comment, really.  What if it would be
useful to add additional failure information in the call chain?  It
could help with debugging in some situations.  Your plan is not to
handle this at all?  What if you get two distinct errors performing an
operation.  Should you chain them together, raise them separately, or
just report the first?

Given the response, I should point out that users may not be the primary
consumer of this information.  Yes, we need to pretty print an error
message at some point, but obtaining relevant and useful debugging /
problem solving information is critical.  Other frameworks may want to
use this to determine how to handle more complicated error conditions.

>> There are other extensibility issues to consider.  In particular, this
>> format is fixed to contain 5 pointers to char *.  I think it would make
>> more sense to define a generic err_info structure that contains a code
>> that describes the type of error that follows. 
>
> Error code are separate from this error structure and are returned by the 
> libbe functions themselves. Changing this would cause a major interface 
> change and require all the consumers of the current library and python 
> bridge to change.

I'm not trying to suggest that you move your return codes into this
structure, rather I'm suggesting that one possibility for making the
format extensible is to use a value to determine what type of structure
you're looking at.  That way it's possible to have multiple types of
err_info_t's that contain different information about different errors,
if it ever becomes necessary to build different classes of error
information.

>> The sockaddr structure is a good example of something similar.
>> Depending upon what socket method is being used, the pointer to
>> sockaddr is copied in and then read as a particular structure.  I can
>> imagine that we might want to have multiple different types of
>> err_info structures, but the current approach precludes that option.
>
> I still don't see why you would need more than one type since you will be 
> puuting the error information you want into the available strings and 
> passing backe the error code separately from the structure. You can put 
> what ever you want into the strings within the structure.

I think you've misunderstood me.  We may both think that five strings
are sufficient to solve the problem today.  The point is that in the
future we may want six; or three strings, two integers, a double, and a
pointer to another struct.  The idea is that you want a structure that
is flexible so that if the error reporting needs change, you can deliver
the proper information to libbe's callers without requiring them to
change the way they use the interface.  What I'm trying to ask is,
"What's your approach for coping with both planned and unexpected additions
to this structure?"

>> In addition to refining the interface, I would make sure that you have a
>> good way of constructing and destroying these objects.  If they're tied
>> to a boot-environment specifc handle that's great.  Otherwise, you'll
>> have to make sure that the caller calls some destroy_my_error_object()
>> routine to get these cleaned up.  It's not optimal, especially since
>> some applications are haphazardly constructed.
>
> That's one of the ways of doing this that Ethan and I had been talking
> about yesterday. The idea of creating a handle to contain the structure
> and we'd be able to pass the information back inside the handle and free
> up the memory when the handle is destroyed. The only problem we have
> with this right now is that all of the interfaces use nvlists so passing
> a handle from the library and back into it would also require an
> interface change that would brake anything already using the library.

I guess I was thinking a bit more generically, but what you describe
makes sense.  Typically, the handle is an opaque struct that contains
object specific information.  If you had a handle per-be and did your
operations in terms of a be, then I suppose it might be easier to
implement this.

On the other hand, if the majority of the libbe consumers are going to
want to change to take advantage of the additonal error reporting,
making the change to use handles at the same time might not be so
disruptive.

Thanks,

-j

[caiman-discuss] [pkg-discuss] cross-project review: Draft BE Error observability design

Reply via email to