Re: [HACKERS] Error message style guide, take 2
I have added this email to CVS as src/tools/error_text. Any changes to it? --- Tom Lane wrote: I'm about to start going through the backend's elog() calls to update them to ereport() style, add error code numbers, polish wording, etc. So it's time to nail down our style guide for message wording. Attached is a revision of the draft that Peter posted on 14-March. Any further comments? BTW, I'd like to SGML-ify this and put it into the developer's guide somewhere; any thoughts where exactly? regards, tom lane What goes where --- The primary message should be short, factual, and avoid reference to implementation details such as specific function names. Short means should fit on one line under normal conditions. Use a detail message if needed to keep the primary message short, or if you feel a need to mention implementation details such as the particular system call that failed. Both primary and detail messages should be factual. Use a hint message for suggestions about what to do to fix the problem, especially if the suggestion might not always be applicable. For example, instead of IpcMemoryCreate: shmget(key=%d, size=%u, 0%o) failed: %m (plus a long addendum that is basically a hint) write Primary:Could not create shared memory segment: %m Detail: Failed syscall was shmget(key=%d, size=%u, 0%o) Hint: the addendum RATIONALE: keeping the primary message short helps keep it to the point, and lets clients lay out screen space on the assumption that one line is enough for error messages. Detail and hint messages may be relegated to a verbose mode, or perhaps a pop-up error-details window. Also, details and hints would normally be suppressed from the server log to save space. Reference to implementation details is best avoided since users don't know the details anyway. Formatting -- Don't put any specific assumptions about formatting into the message texts. Expect clients and the server log to wrap lines to fit their own needs. In long messages, newline characters (\n) may be used to indicate suggested paragraph breaks. Don't end a message with a newline. Don't use tabs or other formatting characters. (In error context displays, newlines are automatically added to separate levels of context such as function calls.) RATIONALE: Messages are not necessarily displayed on terminal-type displays. In GUI displays or browsers these formatting instructions are at best ignored. Quotation marks --- English text should use double quotes when quoting is appropriate. Text in other languages should consistently use one kind of quotes that is consistent with publishing customs and computer output of other programs. RATIONALE: The choice of double quotes over single quotes is somewhat arbitrary, but tends to be the preferred use. Some have suggested choosing the kind of quotes depending on the type of object according to SQL conventions (namely, strings single quoted, identifiers double quoted). But this is a language-internal technical issue that many users aren't even familiar with, it won't scale to other kinds of quoted terms, it doesn't translate to other languages, and it's pretty pointless, too. Use of quotes - Use quotes always to delimit file names, user-supplied identifiers, and other variables that might contain words. Do not use them to mark up variables that will not contain words (for example, operator names). There are functions in the backend that will double-quote their own output at need (for example, format_type_be()). Do not put additional quotes around the output of such functions. RATIONALE: Objects can have names that create ambiguity when embedded in a message. Be consistent about denoting where a plugged-in name starts and ends. But don't clutter messages with unnecessary or duplicate quote marks. Grammar and punctuation --- The rules are different for primary error messages and for detail/hint messages: Primary error messages: Do not capitalize the first letter. Do not end a message with a period. Do not even think about ending a message with an exclamation point. Detail and hint messages: Use complete sentences, and end each with a period. Capitalize the starts of sentences. RATIONALE: Avoiding punctuation makes it easier for client applications to embed the message into a variety of grammatical contexts. Often, primary messages are not grammatically complete sentences anyway. (And if they're long enough to be more than one sentence, they should be split into primary and detail parts.) However, detail and hint messages are longer and may need to include multiple sentences. For consistency, they should follow complete-sentence style even
Re: [HACKERS] Error message style guide, take 2
Bruce Momjian [EMAIL PROTECTED] writes: I have added this email to CVS as src/tools/error_text. Any changes to it? Waste of CVS space; the real documentation is in SGML: http://developer.postgresql.org/docs/postgres/error-style-guide.html regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Error message style guide, take 2
Oh, removed. --- Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: I have added this email to CVS as src/tools/error_text. Any changes to it? Waste of CVS space; the real documentation is in SGML: http://developer.postgresql.org/docs/postgres/error-style-guide.html regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Error message style guide
Tom Lane wrote: It was mostly meant as a broad hint not to write open() failed, which can clearly be written more user-friendly without loss of information. For less obvious cases we can use a mixed style. Say 'could not synchronize file %s with disk (fsync failed)'. That tells people at least that it's got something to do with their I/O subsystem. There are some places where we mention the syscall so that we can spell out the exact parameters that were passed, for possible debugging use. But this could probably be pushed to the detail message. So instead of IpcMemoryCreate: shmget(key=%d, size=%u, 0%o) failed: %m (plus a long hint) perhaps Primary:Could not create shared memory segment: %m Detail: Failed syscall was shmget(key=%d, size=%u, 0%o) Hint: as before Seem good? I agree with this, but I believe the detail should really include quite a lot of detail: the file and line number where the error occurred, the error number returned by the syscall (if a syscall is involved), parameters to the function that failed, and so forth. In essence, I think enough detail should be included to make it possible to determine exactly what went wrong and, hopefully, why it went wrong. This stuff might not be terribly useful to the end user, but it'll be of great use to a knowledgeable administrator (one of my pet peeves is software that doesn't tell you why something failed, only that it did). -- Kevin Brown [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Error message style guide
On Fri, 14 Mar 2003, Steve Crawford wrote: One thing that would be great from a user's perspective (and which might reduce the volume of support questions as well) is to uniquely number all errors as in: Error 1036: the foo could not faz the fleep I agree with the unique codes. It does make googling for help easier. This is how informix does it - you get a sqlstate and what they call a 'native error'. Using SQLError (odbc) you can get a one liner about it, but the real meat comes from either the documentation or from the command line program finderr. You give it the native error and it gives you a paragraph of information about the problem and what options you have. Plus, if you have a numeric code sent back you can have an error handler that looks quite a bit nicer - switch(pgErrorCode) { case PG_HDD_ON_FIRE: die_horrifically(); break; case PG_UNKNOWN_USER: tell_user_he_is_dumb(); break; } instead of a big pile of strcmp's. From an efficiency standpoint, I don't know if it would have any benefit of sending back a native code and have the client ask for the details if it wants it. -- Jeff Trout [EMAIL PROTECTED] http://www.jefftrout.com/ Ronald McDonald, with the help of cheese soup, controls America from a secret volkswagon hidden in the past --- ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Error message style guide
Tom Lane writes: I think a style guide should just say Keep primary messages short. Right. How about something like Avoid tabs. Insert newlines as needed to keep message lines shorter than X characters. Keep in mind that client code might reformat long messages for its own purposes, so don't rely on text layout for legibility. I would prefer leaving the formatting to client and have the backend provide a more semantic-type markup. For example the newline character could be considered a paragraph break and within the paragraph the text just flows. (We could hack up some line-breaking logic in psql.) Or a really fancy solution: Use the Unicode characters for line and paragraph breaks. *Really* fancy, admittedly. regression=# select 'a' ### 'b'; ERROR: Unable to identify an operator '###' for types 'unknown' and 'unknown' You will have to retype this query using an explicit cast I think format_type can remain an exception to that rule, one way or the other. If there are more of these, we need to think harder. I'm not sure that I like making messages be utterly dependent on the presence of quotes to be decipherable. Would you consider the above message to be better phrased as, say, ERROR: Unable to identify an infix operator unknown ### unknown I think the above is better. I guess I don't quite follow you here. This works for primary messages, I think, but not detail and hint messages. Can we use a different rule for detail/hint messages? These rules weren't meant to apply to detail/hint. We should probably require those to be complete sentences. We almost uniformly use could not open file %s: %m for this now. Is the parenthesis style really better? I don't find it more natural. In most cases, the %m part is the actually useful information, so it seems odd to put it in parentheses. That normally indicates a subsidiary, less-important part of a sentence. Yeah, the colon-style seems to be most wide-spread, also outside PostgreSQL. Nonetheless I'm not sure that avoiding references to system calls will improve matters. In particular, for cases that are really can't happen situations (eg, we are normally not expecting select(2) to fail), I'm not seeing the advantage of avoiding the reference. It was mostly meant as a broad hint not to write open() failed, which can clearly be written more user-friendly without loss of information. For less obvious cases we can use a mixed style. Say 'could not synchronize file %s with disk (fsync failed)'. That tells people at least that it's got something to do with their I/O subsystem. -- Peter Eisentraut [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Error message style guide
Peter Eisentraut [EMAIL PROTECTED] writes: I would prefer leaving the formatting to client and have the backend provide a more semantic-type markup. For example the newline character could be considered a paragraph break and within the paragraph the text just flows. (We could hack up some line-breaking logic in psql.) I could live with that ... anyone have a hard time with it? Or a really fancy solution: Use the Unicode characters for line and paragraph breaks. *Really* fancy, admittedly. I don't think this will fly; we'd have to express the Unicode characters as octal escapes in the error message calls, no? Way too ugly. If you think that putting soft line breaks and para breaks into message texts would be useful, I'd lean to using \n and \f to do it. But the simpler way you mentioned first seems sufficient. Nonetheless I'm not sure that avoiding references to system calls will improve matters. In particular, for cases that are really can't happen situations (eg, we are normally not expecting select(2) to fail), I'm not seeing the advantage of avoiding the reference. It was mostly meant as a broad hint not to write open() failed, which can clearly be written more user-friendly without loss of information. For less obvious cases we can use a mixed style. Say 'could not synchronize file %s with disk (fsync failed)'. That tells people at least that it's got something to do with their I/O subsystem. There are some places where we mention the syscall so that we can spell out the exact parameters that were passed, for possible debugging use. But this could probably be pushed to the detail message. So instead of IpcMemoryCreate: shmget(key=%d, size=%u, 0%o) failed: %m (plus a long hint) perhaps Primary:Could not create shared memory segment: %m Detail: Failed syscall was shmget(key=%d, size=%u, 0%o) Hint: as before Seem good? BTW: in this particular case I notice that the code is set up to generate different hints depending on the value of errno. One could fake this in my original proposal using conditional expressions: ereport(..., (errno == EINVAL) ? errhint(...) : 0, (errno == ENOMEM) ? errhint(...) : 0, ...); This seems notationally klugy though. I wonder whether it's worth making errhint's signature be errhint(bool condition, const char *fmt, ...) with the hint used only if the bool parameter is true. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Error message style guide
Peter Eisentraut [EMAIL PROTECTED] writes: Some people were mentioning an error message style guide. Here's a start of one that I put together a while ago. Feel free to consider it. Looks like a good start. But you expected quibbles, right? ;-) The main part of a message should be at most 72 characters long. For embedded format specifiers (%s, %d, etc.), a reasonable estimate of the expected string should be taken into account. The rest should be distributed to the detail and the hint parts. This is not really workable to adhere to strictly. For example, a message that includes more than one user identifier (eg, a table and column name) fails the test immediately since each name might be NAMEDATALEN-1 long. Even with only one identifier, I have nine characters allowed for the error text ... less quotes and a space makes six... less ERROR: leaves me with nothing. Okay, so you said reasonable estimate not worst case, but unless you want to specify what you think a reasonable estimate is, this guideline is useless. I think a style guide should just say Keep primary messages short. A message may not contain a newline or a tab. This might work for primary messages given the keep it short dictum, but it's quite unworkable for detail and hint messages --- we have some of the latter that run to many lines. How about something like Avoid tabs. Insert newlines as needed to keep message lines shorter than X characters. Keep in mind that client code might reformat long messages for its own purposes, so don't rely on text layout for legibility. Use quotes always to denote files, database objects, and other variables of a character-string nature. Do not use them to mark up nonvariable items. One thing that's been annoying me recently is that some of our messages exhibit double quoting, eg regression=# select 'a' ### 'b'; ERROR: Unable to identify an operator '###' for types 'unknown' and 'unknown' You will have to retype this query using an explicit cast The reason this particular case happens is that the elog call puts (single) quotes around the result of format_type_be --- and the latter puts double quotes around names that seem to need it, which include mixed-case names and (as in this case) names that are also SQL keywords. Individually each of these choices seems defensible, but the result is mighty ugly. How can we fix it? NOTE: This format encourages embedding data items into the message in grammatical positions instead of the old style 'invalid value: bar'. I'm not sure that I like making messages be utterly dependent on the presence of quotes to be decipherable. Would you consider the above message to be better phrased as, say, ERROR: Unable to identify an infix operator unknown ### unknown Throw a few spaces and random characters into the type names, and this gets very unreadable very fast. The invalid value: bar style has the advantage that the message text is pretty clearly separated from the object being complained about. Do not end the message with a period. Do not even think about ending a message with an exclamation point. RATIONALE: Avoiding punctuation makes it easier for client applications to embed the message into a variety of grammatical contexts. Often, messages are not grammatically complete sentences anyway. (And if they're long enough to be more than one sentence, split them up.) This works for primary messages, I think, but not detail and hint messages. Can we use a different rule for detail/hint messages? Use lower case for message wording, including the first letter of the message. Use upper case for SQL commands and key words if the message refers to the command string. Again, this falls down for multi-sentence hints. Instead of multiple sentences, consider using semicolons or commas. Here's an example of an actual hint in the present sources. Do you really want to convert it into one run-on sentence? This error does *not* mean that you have run out of disk space. It occurs when either the system limit for the maximum number of semaphore sets (SEMMNI), or the system wide maximum number of semaphores (SEMMNS), would be exceeded. You need to raise the respective kernel parameter. Alternatively, reduce PostgreSQL's consumption of semaphores by reducing its max_connections parameter (currently %d). The PostgreSQL Administrator's Guide contains more information about configuring your system for PostgreSQL. | could not open file %s (%m) RATIONALE: It would be difficult to account for all possible error codes to paste this into a single smooth sentence. It also looks better and is more flexible than colons or dashes to separate the sentences We almost uniformly use could not open file %s: %m for this now. Is the parenthesis style really better? I don't find it more natural. In most cases, the %m part is the actually useful information, so it seems odd to
[HACKERS] Error message style guide
Some people were mentioning an error message style guide. Here's a start of one that I put together a while ago. Feel free to consider it. Size of message --- The main part of a message should be at most 72 characters long. For embedded format specifiers (%s, %d, etc.), a reasonable estimate of the expected string should be taken into account. The rest should be distributed to the detail and the hint parts. RATIONALE: 72 characters is typically considered an appropriate line length on terminal-type displays. Consequently, this length is fair to psql users and readers of the server log. Also, longer messages will tend to get chatty. Newlines, tabs -- A message may not contain a newline or a tab. RATIONALE: Messages are not necessarily displayed on terminal-type displays. In GUI displays or browsers these formatting intructions are at best ignored. QUESTION: I think formatting characters should be avoided in detail and hint messages as well, for the same reasons. Quotation marks --- English text should use double quotes when quoting is appropriate. Text in other languages should consistently use one kind of quotes that is consistent with publishing customs and computer output of other programs. RATIONALE: The choice of double quotes over single quotes is somewhat arbitrary, but tends to be the preferred use. Do not distinguish the kind of quotes depending on the type of object in SQL terms (i.e., strings single quoted, identifiers double quoted). This is a language-internal technical issue that many users aren't even familiar with, it won't scale to all quoted terms, it doesn't translate to other languages, and it's pretty pointless, too. Use of quotes - Use quotes always to denote files, database objects, and other variables of a character-string nature. Do not use them to mark up nonvariable items. RATIONALE: Objects can have names that create ambiguity when embedded in a message. Be consistent about denoting where a plugged-in name starts and ends. NOTE: This format encourages embedding data items into the message in grammatical positions instead of the old style 'invalid value: bar'. Punctuation --- Do not end the message with a period. Do not even think about ending a message with an exclamation point. RATIONALE: Avoiding punctuation makes it easier for client applications to embed the message into a variety of grammatical contexts. Often, messages are not grammatically complete sentences anyway. (And if they're long enough to be more than one sentence, split them up.) Upper case vs. lower case - Use lower case for message wording, including the first letter of the message. Use upper case for SQL commands and key words if the message refers to the command string. RATIONALE: It's easier to make everything look more consistent this way, since some messages are complete sentences and some not. Grammar --- Use the active voice. Use complete sentences when there is an acting subject (A could not do B). Use telegram style without subject if the subject would be the program itself; do not use I for the program. RATIONALE: The program is not human. Don't pretend otherwise. Instead of multiple sentences, consider using semicolons or commas. RATIONALE: This avoids peculiar punctuation if you follow the request to leave off the final period. Present vs past tense - There is a nontrivial semantic difference between sentences of the form | could not open file %s and | cannot open file %s The first one means that the attempt to open the file failed. The message should give a reason, such as disk full or file doesn't exist. The past tense is appropriate because next time the disk might not be full anymore or the file in question may exist. The second form indicates the the functionality of opening the named file does not exist at all in the program, or that it's conceptually impossible. The present tense is appropriate because the condition will persist indefinitely. RATIONALE: Granted, the average user will not be able to draw great conclusions merely from the tense of the message, but since the language provides us with a grammar we should use it correctly. Type of the object -- When citing the name of an object, state what kind of object it is. RATIONALE: Else no one will know what foo.bar.baaz ist. Brackets Brackets are only to be used in command synopses to denote optional arguments, or to denote an array subscript. RATIONALE: Anything else does not correspond to widely-known customary usage and will confuse people. Parentheses --- Parentheses can be used to separate subsentences when they are generated elsewhere. For example: | could not open file %s (%m) RATIONALE: It would be difficult to account for all possible error codes to paste this into a single smooth sentence. It also looks better and is
Re: [HACKERS] Error message style guide
One thing that would be great from a user's perspective (and which might reduce the volume of support questions as well) is to uniquely number all errors as in: Error 1036: the foo could not faz the fleep The advantages of this include: Ease of documentation: a manual could containg a section discussing each message. Similarly an error number could be used to easily access a web page discussing the error in more detail than a simple message allows. Ease of searching: google searches like postgresql error 1036 tend to yield lots of relevant information - I've found that including an error number where available in a google search yields far better results that searching with text alone. Pinpointing trouble: unique IDs would mean that anyone looking into a specific problem would know exactly which line of code in PostgreSQL sent the error. If one wants to get fancy the numbers could run in series depending on the category of error similar to http/smtp/ftp response codes. Of course this would require appointing a keeper of the error codes who would dole them out as required to prevent dups. Just a thought - now for a pint of Guinness. Cheers, Steve On Friday 14 March 2003 4:43 pm, Peter Eisentraut wrote: Some people were mentioning an error message style guide. Here's a start of one that I put together a while ago. Feel free to consider it. Size of message --- The main part of a message should be at most 72 characters long. For embedded format specifiers (%s, %d, etc.), a reasonable estimate of the expected string should be taken into account. The rest should be distributed to the detail and the hint parts. RATIONALE: 72 characters is typically considered an appropriate line length on terminal-type displays. Consequently, this length is fair to psql users and readers of the server log. Also, longer messages will tend to get chatty. Newlines, tabs -- A message may not contain a newline or a tab. RATIONALE: Messages are not necessarily displayed on terminal-type displays. In GUI displays or browsers these formatting intructions are at best ignored. QUESTION: I think formatting characters should be avoided in detail and hint messages as well, for the same reasons. Quotation marks --- English text should use double quotes when quoting is appropriate. Text in other languages should consistently use one kind of quotes that is consistent with publishing customs and computer output of other programs. RATIONALE: The choice of double quotes over single quotes is somewhat arbitrary, but tends to be the preferred use. Do not distinguish the kind of quotes depending on the type of object in SQL terms (i.e., strings single quoted, identifiers double quoted). This is a language-internal technical issue that many users aren't even familiar with, it won't scale to all quoted terms, it doesn't translate to other languages, and it's pretty pointless, too. Use of quotes - Use quotes always to denote files, database objects, and other variables of a character-string nature. Do not use them to mark up nonvariable items. RATIONALE: Objects can have names that create ambiguity when embedded in a message. Be consistent about denoting where a plugged-in name starts and ends. NOTE: This format encourages embedding data items into the message in grammatical positions instead of the old style 'invalid value: bar'. Punctuation --- Do not end the message with a period. Do not even think about ending a message with an exclamation point. RATIONALE: Avoiding punctuation makes it easier for client applications to embed the message into a variety of grammatical contexts. Often, messages are not grammatically complete sentences anyway. (And if they're long enough to be more than one sentence, split them up.) Upper case vs. lower case - Use lower case for message wording, including the first letter of the message. Use upper case for SQL commands and key words if the message refers to the command string. RATIONALE: It's easier to make everything look more consistent this way, since some messages are complete sentences and some not. Grammar --- Use the active voice. Use complete sentences when there is an acting subject (A could not do B). Use telegram style without subject if the subject would be the program itself; do not use I for the program. RATIONALE: The program is not human. Don't pretend otherwise. Instead of multiple sentences, consider using semicolons or commas. RATIONALE: This avoids peculiar punctuation if you follow the request to leave off the final period. Present vs past tense - There is a nontrivial semantic difference between sentences of the form | could not open file %s and | cannot open file %s The first one means that the attempt to open the file failed. The message