[caiman-discuss] [pkg-discuss] cross-project review: Draft BE Error observability design

Evan Layton Fri, 21 Aug 2009 10:19:05 -0600

After thinking about this a bit more, I was getting bogged down in the idea 
that 
is was a requirement to keep the interfaces the same. However this was causing 
us to work around things a way that limited how we would do things going 
forward. I think we should be doing something along the lines of using a 
descriptor to pass the error information.


While this requires a change to the interface it also gives us the ability to 
make this more extensible as well as opaque.

I'm updating the document now to address this issues and will send out an 
updated version when I have that complete.

Thanks!
-evan

Evan Layton wrote:
> My apologies for not getting all of the responses in the first email, I 
> hit send too soon...
> 
> johansen at sun.com wrote:
>> On Thu, Aug 20, 2009 at 04:31:21PM -0600, Evan Layton wrote:
>>>> In my high-level comments, I observed that this structure doesn't allow
>>>> libbe to grow or add any other error information beyond what's provided
>>>> here.  At a minimum, there should be a field, or linked list, that 
>>>> allow
>>>> callers in the error handling chain to add ancillary information as an
>>>> operation progresses.  The initial failure may be that a device was in
>>>> use, but as the error is propagated up the call chain, there may be
>>>> other failures that occur too.  How do you plan to handle this case?
>>>> Augmenting the error object with other information is one approach, but
>>>> there are certainly others.
>>> Anything that happens after this failure it not something we 
>>> necessarily want to pass back back through to a user. We don't want 
>>> to give them more information than they really need and should stop 
>>> at the first error that caused us to fail. This was discussed with 
>>> Frank and it was his suggestion that we don't provide too much 
>>> information and confuse the user.
>>
>> I'm not sure this adresses my comment, really.  What if it would be
>> useful to add additional failure information in the call chain?  It
>> could help with debugging in some situations.  Your plan is not to
>> handle this at all?  What if you get two distinct errors performing an
>> operation.  Should you chain them together, raise them separately, or
>> just report the first?
> 
> I originally had the idea of what I termed "clean-up" errors and error 
> strings for errors that may happen while doing any cleanup needed due to 
> hitting the original error. I had removed them based on comments from 
> Frank, however can see your point now that just because we keep track of 
> this information does not mean that we have to display that to the user...
> 
>>
>> Given the response, I should point out that users may not be the primary
>> consumer of this information.  Yes, we need to pretty print an error
>> message at some point, but obtaining relevant and useful debugging /
>> problem solving information is critical.  Other frameworks may want to
>> use this to determine how to handle more complicated error conditions.
> 
> OK I think this is the point I was not getting a clue on the first time 
> around. What you're saying is that the information we may want to 
> collect may be much more extensive that just the information we would 
> pass to a user.
> 
> I guess the kind of thing you're thinking of is, that with this added 
> information it may be possible for consumers of say libbe to get back 
> enough information that things like beadm or pkg may be able to take 
> corrective action based on what we pass back, fix the problem and the 
> user doesn't have to do anything to fix the issue.
> 
> I can agree with this and I'll look at this a bit more. However I'm not 
> sure how much of this is really out of scope for this project.
> 
>>
>>>> There are other extensibility issues to consider.  In particular, this
>>>> format is fixed to contain 5 pointers to char *.  I think it would make
>>>> more sense to define a generic err_info structure that contains a code
>>>> that describes the type of error that follows. 
>>> Error code are separate from this error structure and are returned by 
>>> the libbe functions themselves. Changing this would cause a major 
>>> interface change and require all the consumers of the current library 
>>> and python bridge to change.
>>
>> I'm not trying to suggest that you move your return codes into this
>> structure, rather I'm suggesting that one possibility for making the
>> format extensible is to use a value to determine what type of structure
>> you're looking at.  That way it's possible to have multiple types of
>> err_info_t's that contain different information about different errors,
>> if it ever becomes necessary to build different classes of error
>> information.
> 
> So what you're really referring to here is more along the idea of having 
> a different structure for each type of error would could possibly hit? 
> Or is it more that we would, for example, have different structures for 
> things like errors internal to the library (libbe), a structure for zfs 
> errors and one for other things outside the library? At this point I 
> don't see the need for that as far a this project since these are all 
> fairly similar. Also the way libbe handle errors is when we hit a 
> failure we stop, clean up and return so the only errors that may also 
> need to captured are any errors while cleaning up. One example of this 
> would be something like a BE creation that fails doing the dataset 
> cloning and then destroying all the snapshots for it also fails. We 
> would want to gather that information as well.
> 
>>
>>>> The sockaddr structure is a good example of something similar.
>>>> Depending upon what socket method is being used, the pointer to
>>>> sockaddr is copied in and then read as a particular structure.  I can
>>>> imagine that we might want to have multiple different types of
>>>> err_info structures, but the current approach precludes that option.
>>> I still don't see why you would need more than one type since you 
>>> will be puuting the error information you want into the available 
>>> strings and passing backe the error code separately from the 
>>> structure. You can put what ever you want into the strings within the 
>>> structure.
>>
>> I think you've misunderstood me.  We may both think that five strings
>> are sufficient to solve the problem today.  The point is that in the
>> future we may want six; or three strings, two integers, a double, and a
>> pointer to another struct.  The idea is that you want a structure that
>> is flexible so that if the error reporting needs change, you can deliver
>> the proper information to libbe's callers without requiring them to
>> change the way they use the interface.  What I'm trying to ask is,
>> "What's your approach for coping with both planned and unexpected 
>> additions
>> to this structure?"
> 
> This in relationship to you other comments now make more sense to me. We 
> may want to return as much information as possible to the consumer of 
> the library. It appears that what you are asking for is that we return 
> information from the library to the consumer that includes not just 
> information about what failed and what the error string may have been 
> but also information that is really internal to the library like maybe 
> what the contents of a sockaddr was or the zfs handle of the dataset 
> that is failing. I think for the orposes of this project that is a bit 
> out of scope but is definitely something we should be thinking about for 
> the Caiman Unified Design (CUD) and the error handling it is intended to 
> do.
> 
> Please keep in mind that the error handling we are talking about here is 
> for libbe only and does not address the full error handling for CUD. 
> While the ideas you've expressed are definitely things to think about 
> for CUD they are not all things we will be able to address here.
> 
> The idea of adding several types of error structures is a good idea but 
> I'd suggest that these be based on the type of consumer of the error 
> handler. In other word there would be libbe, AI, DC, etc. structures 
> that would be able to handle the different type of errors for these 
> components. We would want to be able to make it fairly strait forward to 
> add need types. However this is also out of scope for this project.
> 
> The one thing I do agree with you here is the need to have the structure 
> that we use be something that can be changed without too much pain as we 
> move from what we are doing here to what will be done for CUD.
> 
> To answer your question on "coping with both planned and unexpected 
> additions" I don't think I have a complete answer yet. Any suggestions 
> in addition to what you've already mentioned are definitely more than 
> welcome!!!
> 
> 
>>
>>>> In addition to refining the interface, I would make sure that you 
>>>> have a
>>>> good way of constructing and destroying these objects.  If they're tied
>>>> to a boot-environment specifc handle that's great.  Otherwise, you'll
>>>> have to make sure that the caller calls some destroy_my_error_object()
>>>> routine to get these cleaned up.  It's not optimal, especially since
>>>> some applications are haphazardly constructed.
>>> That's one of the ways of doing this that Ethan and I had been talking
>>> about yesterday. The idea of creating a handle to contain the structure
>>> and we'd be able to pass the information back inside the handle and free
>>> up the memory when the handle is destroyed. The only problem we have
>>> with this right now is that all of the interfaces use nvlists so passing
>>> a handle from the library and back into it would also require an
>>> interface change that would brake anything already using the library.
>>
>> I guess I was thinking a bit more generically, but what you describe
>> makes sense.  Typically, the handle is an opaque struct that contains
>> object specific information.  If you had a handle per-be and did your
>> operations in terms of a be, then I suppose it might be easier to
>> implement this.
>>
>> On the other hand, if the majority of the libbe consumers are going to
>> want to change to take advantage of the additonal error reporting,
>> making the change to use handles at the same time might not be so
>> disruptive.
> 
> One of the requirements we have at the moment is that adding this should 
> not break existing callers of the library. The use of something like a 
> global structure or something those lines may be one way to allow us to 
> do that.
> 
>>
>> Thanks,
>>
>> -j
> 
> _______________________________________________
> caiman-discuss mailing list
> caiman-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

[caiman-discuss] [pkg-discuss] cross-project review: Draft BE Error observability design

Reply via email to