[caiman-discuss] [pkg-discuss] cross-project review: Draft BE Error observability design

[email protected] Fri, 21 Aug 2009 14:00:04 -0700

Evan,

On Thu, Aug 20, 2009 at 06:37:11PM -0600, Evan Layton wrote:
> My apologies for not getting all of the responses in the first email, I 
> hit send too soon...

No worries.

> johansen at sun.com wrote:
>> On Thu, Aug 20, 2009 at 04:31:21PM -0600, Evan Layton wrote:
>> Given the response, I should point out that users may not be the primary
>> consumer of this information.  Yes, we need to pretty print an error
>> message at some point, but obtaining relevant and useful debugging /
>> problem solving information is critical.  Other frameworks may want to
>> use this to determine how to handle more complicated error conditions.
>
> OK I think this is the point I was not getting a clue on the first time 
> around. What you're saying is that the information we may want to collect 
> may be much more extensive that just the information we would pass to a 
> user.
>
> I guess the kind of thing you're thinking of is, that with this added  
> information it may be possible for consumers of say libbe to get back 
> enough information that things like beadm or pkg may be able to take 
> corrective action based on what we pass back, fix the problem and the 
> user doesn't have to do anything to fix the issue.

Yes, exactly.  If beadm or pkg can fix the problem, then we may not have
to present a message to the user.  Obviously, not all errors can be
corrected, but it does give us the opportunity to deal with commonly
encountered ones.

> I can agree with this and I'll look at this a bit more. However I'm not 
> sure how much of this is really out of scope for this project.

Ok, but to keep with the theme in my previous e-mails, it would be good
if the design didn't preclude the ability to add this functionality
later.

>> I'm not trying to suggest that you move your return codes into this
>> structure, rather I'm suggesting that one possibility for making the
>> format extensible is to use a value to determine what type of structure
>> you're looking at.  That way it's possible to have multiple types of
>> err_info_t's that contain different information about different errors,
>> if it ever becomes necessary to build different classes of error
>> information.
>
> So what you're really referring to here is more along the idea of having 
> a different structure for each type of error would could possibly hit? Or 
> is it more that we would, for example, have different structures for 
> things like errors internal to the library (libbe), a structure for zfs 
> errors and one for other things outside the library?

No.  I'm suggesting that your structures have some amount of
self-describing data so that if you ever need to introduce multiple
types of errors, you won't have painted yourself into a corner.

This is what you're currently proposing:

           struct err_info_str{
               char *cmd_str;
               char *op_str;
               char *failed_at;
               char *fail_msg_str;
               char *fail_fixit_str;
            };

I'm suggesting that one approach to making this structure extensible is
to do the following:

        typedef enum err_info_type {
                ET_INFO_STR = 0,
                ET_SOME_OTHER_TYPE,
        } err_info_type_t;

        struct err_info {
                err_info_type_t ei_type;
        };

        struct err_info_str {
                err_info_type_t ei_type;
                char *ei_cmd_str;
                char *ei_op_str;
                char *ei_failed_at;
                char *ei_fail_msg_str;
                char *ei_fail_fixit_str;
        };

If you ensure that every err_info starts with its type information, then
to add another err info structure, all you have to do is add a value to
the enum, and build a new structure.  This assumes that you've written
the code that copes with these things to properly look at the type, of
course.

In writing this example, I noticed that you don't prefix your structure
members with a value that indicates they're unique to the structure.  It
would be useful to do this.  It can help you with search/replace issues
later on.

>> I think you've misunderstood me.  We may both think that five strings
>> are sufficient to solve the problem today.  The point is that in the
>> future we may want six; or three strings, two integers, a double, and a
>> pointer to another struct.  The idea is that you want a structure that
>> is flexible so that if the error reporting needs change, you can deliver
>> the proper information to libbe's callers without requiring them to
>> change the way they use the interface.  What I'm trying to ask is,
>> "What's your approach for coping with both planned and unexpected additions
>> to this structure?"
>
> This in relationship to you other comments now make more sense to me. We 
> may want to return as much information as possible to the consumer of the 
> library. It appears that what you are asking for is that we return 
> information from the library to the consumer that includes not just 
> information about what failed and what the error string may have been but 
> also information that is really internal to the library like maybe what 
> the contents of a sockaddr was or the zfs handle of the dataset that is 
> failing. I think for the orposes of this project that is a bit out of 
> scope but is definitely something we should be thinking about for the 
> Caiman Unified Design (CUD) and the error handling it is intended to do.

This isn't actually what I was trying to convey.  Above, I gave one
example of how you could make the structure extensible, if you ever
decide to add more error types.  The sockaddr stucture is another
similar approach, except that the information to determine what sockaddr
is being used is kept with the socket state.  In networking code, you
supply a sockaddr_in for AF_INET sockets and a sockaddr_un for AF_UNIX
sockets.  The structures are variable length and have different members,
but the interfaces the sockets code uses is flexible enough to cope with
different sockaddr structures depending upon the protocol.

> Please keep in mind that the error handling we are talking about here is 
> for libbe only and does not address the full error handling for CUD. 
> While the ideas you've expressed are definitely things to think about for 
> CUD they are not all things we will be able to address here.
>
> To answer your question on "coping with both planned and unexpected 
> additions" I don't think I have a complete answer yet. Any suggestions in 
> addition to what you've already mentioned are definitely more than 
> welcome!!!

I've been trying to suggest three different possible approaches.  I was
hoping that you'd be able to look at the suggestions and determine if
one worked better than the rest.  I'll summarize if it helps.

1. Create structures that contain type information, so it's possible to
determine what you're looking at when you get the structure.

2. You can take an approach similar to the way the sockets code copes
with different error messages.  The catch with this is that your
accessor functions need to be able to determine the type of object that
you're looking at.

3. Declare the err_info_t entirely opaque and use an accessor
function based interface to get new properties as you add them.

HTH,

-j

[caiman-discuss] [pkg-discuss] cross-project review: Draft BE Error observability design

Reply via email to