An additional argument for logging a struct and putting all the to-string logic somewhere else is that it would allow us to incorporate localization and internationalization relatively painlessly. If we have to pass around a string, it is much more painful to do this.
On Tue, Nov 10, 2015 at 12:28 PM, Michael Hopcroft <[email protected]> wrote: > There have been a lot of great suggestions on this thread. My > recommendation is that I do an experiment to uncover any hidden issues with > the macro approach outlined in my original email. I will try to design the > API of the ErrorImpl class so that it would be, not only possible, but easy > to add support for error chaining and message formatting, as has been > suggested on this thread. If I don’t encounter any significant problems > with the basic experiment, I will come back to this group with a patch and > we can then discuss whether to push the design further forward. I’m a big > fan of an error infrastructure because it gives us a centralized location > in the code to add debugging hooks and apply various policies such as > logging and formatting. > > On Fri, Nov 6, 2015 at 1:56 PM, Alex Clemmer <[email protected]> > wrote: > >> In general, I think a good answer to my question is for the place that >> is reporting or unboxing the error to aggregate data into an error >> struct that holds things like the "error stack" (and other semantics >> about the log, including perhaps error codes), and then let whoever is >> logging, reporting or printing the message, handle the pretty-printing >> semantics. This is a departure from our current log-the-error-message >> approach, where the error message acts as the sole communicator of >> error semantics and context, including the details about the line >> number. >> >> In fact, I think it's worth wondering whether the place where we >> create an error should even log the error message -- right now we add >> a lot of strings together, and it makes more sense to me to have a >> methodology for just populating an "error struct" with data about the >> error, and then the `toString` method (or whatever) can simply >> generate the error messages as necessary, in one holistic place. That >> way, we have centralized place that determining how an error gets >> transformed into a string, instead of a lot of ad hoc string >> concatenations and error unboxing. Another advantage is that you'd >> have very fine-grained control over how to respond to errors, as it is >> a typesafe struct. In fact, you could even imagine having a system >> where error stringifying is handled completely out-of-band, meaning >> that logging could become much, much cheaper in general. >> >> But anyway, more broadly, what I'm trying to say is that the place >> where we are returning an error should not know or care about how the >> downstream sites pretty-print it or log it. Those are separate >> concerns. >> >> On Fri, Nov 6, 2015 at 1:17 PM, Benjamin Mahler >> <[email protected]> wrote: >> > Thanks for the proposal Michael. This has come up in discussions before >> and >> > I think it's really valuable, so it's nice to see folks thinking about >> > this. Some things to consider: >> > >> > Don't forget 'Failure' which is unfortunately not unified with 'Error'. >> > >> > How does this affect error message composition (as Alex alluded to)? >> > Currently we do manual error message composition ("<Caller message>: >> > <callee message>: <callee's callee message>: ...") but in the future we >> may >> > add composition support to our synchronous types like Try,Result,etc to >> be >> > similar to Future composition with .then. See: >> > https://issues.apache.org/jira/browse/MESOS-785 >> > >> > Given composition, it sounds like we want to decouple the trace >> information >> > from the error message, so that we can store a "stack-trace" of multiple >> > code locations? >> > >> > It would be good to be aware of the limitations around Future composition >> > and failure messages. When a .then chain encounters a failure, no error >> > message wrapping occurs as the failure bubbles up. The result is that we >> > instead get "<bottom-most callee message>" as the failure message without >> > the caller wrapping I mentioned above. Tracing will help alleviate only >> > having the bottom most callee failure, but it would be great to think >> about >> > how to produce better messages as well (given MESOS-785 lands us in the >> > same boat for Try,Result,etc). >> > >> > Typed errors/failures has come up before as well, but might be orthogonal >> > enough to discuss separately. >> > >> > On Fri, Nov 6, 2015 at 1:16 AM, Alex Clemmer < >> [email protected]> >> > wrote: >> > >> >> I'm strongly in favor of moving down this path, or a path like it. >> >> >> >> As Mesos becomes increasingly important as core distributed systems >> >> infrastructure, it will become correspondingly helpful and important >> >> to have things like excellent core error reporting facilities, even if >> >> they come at the cost of some technical risk in the short term. In >> >> this case in particular, I believe this error reporting model, where >> >> the developer need only fill in semantic data about the error (e.g., >> >> the error message), and the boilerplate context of the error (e.g., >> >> the line number of the error) is filled in automatically and opaquely >> >> by the system, is clearly the "right" model. On the merits of simply >> >> adding line numbers itself, I think this is worth prototyping, and if >> >> anything, I'd ask for the model to go further -- I've worked on scale >> >> systems where errors are all "stringly typed" and the call sites are >> >> all tribal knowledge, and I've worked on systems where the errors are >> >> highly semantically regimented, and it is extremely clear to me that >> >> the latter is clearly, unequivocally better. >> >> >> >> It's true there are some subtleties to think through. We'll need to >> >> think about things like ABI compatibility, and we'll need to think >> >> about whether there are situations where we _don't_ want line numbers >> >> to be reported (e.g., we often "unbox" the error message and repackage >> >> it, and it's worth wondering whether these semantics hurt or help >> >> those error handling practices). It will indeed require a steady hand >> >> to make sure the needle is "threaded correctly" and we don't stomp on >> >> something important by accident. >> >> >> >> But I think it's worth taking the time to give this a shot anyway. I'm >> >> happy to do the code reviews myself, if that helps. :) >> >> >> >> On Fri, Nov 6, 2015 at 12:29 AM, Michael Hopcroft <[email protected]> >> >> wrote: >> >> > Error reporting in Mesos is based on Stout’s Error class and a >> variety of >> >> > other classes, such as ErrnoError and WindowsError, that derive from >> >> Error. >> >> > One goal of these classes is to simplify the construction and storage >> of >> >> > error message strings. As an example, the ErrnoError constructor >> >> > automatically incorporates std::string(strerror(errno)) into its error >> >> > message. >> >> > >> >> > >> >> > Stout catches exceptions and converts them into Try<T> and Result<T> >> >> > objects which are used as polymorphic return values that can express >> >> either >> >> > an error condition or a proper return value. One consequence of >> catching >> >> > exceptions is that call stacks from throw sites and error sites are >> lost. >> >> > This makes it harder to debug production environment crashes post hoc. >> >> > >> >> > >> >> > One way to improve debugability would be to add more information about >> >> the >> >> > error site to the error message. This could be done by convention, >> but it >> >> > seems that the vast majority of the error messages in Mesos currently >> >> don’t >> >> > include much information about the site that originally generated an >> >> error. >> >> > >> >> > >> >> > Because much of the code in Mesos uses Error and ErrnoError, we have >> an >> >> > opportunity to add diagnostic information automatically with only a >> small >> >> > amount of churn. What we’d need to do is provide Error creation macros >> >> that >> >> > would pass the source file name and line number to the constructors of >> >> the >> >> > various Error classes. >> >> > >> >> > >> >> > Concretely this would involve >> >> > >> >> > >> >> > - class Error >> >> > - Rename declaration and definition to ErrorImpl (or ErrorBase). >> >> This >> >> > change is not error prone. >> >> > - Rename references to class’ type. These changes are mostly >> >> limited >> >> > to result.hpp and try.hpp. >> >> > - Don’t rename calls to constructor. These will now just invoke >> the >> >> > macro. >> >> > - Modify the constructor to take a line number and source >> filename >> >> in >> >> > addition to a message. >> >> > - Define a macro called Error(message) that has the same >> prototype >> >> as >> >> > the original Error constructor. This macro would invoke >> >> > ErrorImpl::ErrorImpl, passing the current line number and source >> >> filename >> >> > along with the user-provided message. The macro would make use >> >> > of __FILE__ >> >> > and __LINE__, which are predefined in gcc and Visual Studio. >> >> > - class ErrnoError, WindowsError. >> >> > - Similar pattern as above. >> >> > >> >> > *Pros.* >> >> > >> >> > >> >> > - Codebase is more debuggable in production environment because the >> >> vast >> >> > majority of error sites will document the file name and line number >> >> in the >> >> > error message. >> >> > - Most error sites don’t require a code change. >> >> > >> >> > *Cons.* >> >> > >> >> > >> >> > - Introduces a macro. In general, we’d like to move away from >> macros >> >> > wherever possible. >> >> > - Discourages subclassing of Error because each subclass would need >> >> its >> >> > own special macro to call its constructor. A brief search of the >> code >> >> base >> >> > seems to indicate that Mesos programmers don’t tend to declare >> >> subclasses >> >> > of Error. Right now, it seems that subclasses exist mainly to >> provide >> >> > different construction semantics. The big question is whether one >> >> would >> >> > want to use subclasses down the road to distinguish between >> >> equivalence >> >> > classes of errors, say those that should bring the process down vs >> >> those >> >> > that are recoverable. >> >> > - Some amount of code churn, mainly in error.hpp, result.hpp, and >> >> > try.hpp. >> >> > >> >> > *Limitations.* >> >> > >> >> > >> >> > - This approach by itself won’t address error sites that invoke >> >> > convenience functions such as Try<T>::error() and >> Result<T>::error(). >> >> These >> >> > could be addressed with a similar macro, but it would need to >> somehow >> >> deal >> >> > with the template parameter as well. >> >> > >> >> > *Risks and Potential Complications.* >> >> > >> >> > >> >> > - The rename process might fail to correctly distinguish between >> error >> >> > sites (i.e. calls to Error::Error()) and other uses of the term >> >> “Error”. >> >> > - There may be important compilers that don’t support __FILE__ and >> >> > __LINE__. These predefined macros have been in the ANSI/ISO >> standard >> >> for a >> >> > while, but it would still be wise to check each compiler we care >> >> about. >> >> > - General hazards associated with macro expansion. >> >> > - Users of Mesos may have written diagnostic tools that depend on >> the >> >> > current values of the error strings. >> >> > - Unit tests may depend on exact values of error strings. If this >> were >> >> > the case, we'd have to provide a convenience function that strips >> off >> >> the >> >> > file names and line numbers so that unit tests don't break when >> error >> >> sites >> >> > move around due to edits elsewhere in the file. >> >> > >> >> > Let me know what you think of this idea. If the approach seems >> reasonable >> >> > and a worthwhile, I would be willing to prototype it to see if it has >> >> some >> >> > fatal flaw. Thanks for your input! >> >> > >> >> > >> >> > >> >> > -Mike >> >> >> >> >> >> >> >> -- >> >> Alex >> >> >> >> Theory is the first term in the Taylor series of practice. -- Thomas M >> >> Cover (1992) >> >> >> >> >> >> -- >> Alex >> >> Theory is the first term in the Taylor series of practice. -- Thomas M >> Cover (1992) >> -- Alex Theory is the first term in the Taylor series of practice. -- Thomas M Cover (1992)
