Re: null-terminated vs. nul-terminated
On Tue, Mar 29, 2022, 5:40 AM Greg Troxel wrote: > > "David H. Gutteridge" writes: > > Thanks for the history and it is all sensible. > > > "nul-terminated" and "null-terminated" seemed more common in man pages > > that originated from historical BSD sources, so, lacking any style > > guide, I inferred the lowercase "nul" was more "correct" as "BSD style" > > (excepting modern OpenBSD), even though that looks a bit odd to me. I > > then examined where "nul-terminated" came from, and found these bulk > > commits, which imply a standard. > > > date: 2005-01-02 18:38:04 +; author: wiz; > > Mark up NULL, and replace null by nul where appropriate. > > > > date: 2006-10-16 08:48:45 +; author: wiz; > > nul/null/NULL cleanup: > > when talking about characters/bytes, use "nul" and "nul-terminate" > > when talking about pointers, use "null pointer" or ".Dv NULL" > > > > So that seemed to me the established style. > > It may have been BSD style, but I think it's wrong to use lowercase for > an ASCII codepoint. And therefore it is confusing to people who know > that the ASCII zero byte is written NUL. > FreeBSD has adopted the POSIX language (null terminated) because it mirrors the standard and the xopen folks have blanket permission to use it in open source man pages... Warner >
Re: null-terminated vs. nul-terminated
And yes I know nl isnot really ascii, but lf and cr are also typically used in lower case. This whole discussion is childish. It doesn't matter. kre
Re: null-terminated vs. nul-terminated
Date:Tue, 29 Mar 2022 07:40:04 -0400 From:Greg Troxel Message-ID: | It may have been BSD style, but I think it's wrong to use lowercase for | an ASCII codepoint. But we use soh esc nl del (etc) in lower case all the time. You might also want to look at share/misc/ascii kre
Re: null-terminated vs. nul-terminated
"David H. Gutteridge" writes: Thanks for the history and it is all sensible. > "nul-terminated" and "null-terminated" seemed more common in man pages > that originated from historical BSD sources, so, lacking any style > guide, I inferred the lowercase "nul" was more "correct" as "BSD style" > (excepting modern OpenBSD), even though that looks a bit odd to me. I > then examined where "nul-terminated" came from, and found these bulk > commits, which imply a standard. > date: 2005-01-02 18:38:04 +; author: wiz; > Mark up NULL, and replace null by nul where appropriate. > > date: 2006-10-16 08:48:45 +; author: wiz; > nul/null/NULL cleanup: > when talking about characters/bytes, use "nul" and "nul-terminate" > when talking about pointers, use "null pointer" or ".Dv NULL" > > So that seemed to me the established style. It may have been BSD style, but I think it's wrong to use lowercase for an ASCII codepoint. And therefore it is confusing to people who know that the ASCII zero byte is written NUL. signature.asc Description: PGP signature
Re: null-terminated vs. nul-terminated
On 2022-03-26 11:57, Roland Illig wrote: The term "null-terminated string" is quite common when talking about C. In contrast, the word "nul" in "nul-terminated" always reminds me of the character abbreviation in ASCII, which has a narrower scope than C. I prefer to keep "null-terminated" here. Hi all, While I don't really want to prolong this debate, as the committer who triggered this discussion, I felt I should respond, in part to explain why I made my choice (which I reverted, though I don't agree "null- terminated" is more correct). TL;DR: there is no consistency here in NetBSD's code base in man pages or comments in source code, and no applicable style guide I know of, but "NUL-terminated" is the most common form found. It seems there was also an attempt at standardization in man pages made in 2005-2006, settling on "nul-terminated". I was taught (several decades ago) that the short form for the null byte or null character was NUL in ANSI C parlance (not just ASCII), and that "null-terminated" was incorrect as it's ambiguous. If someone were to say "null-byte-terminated", "null-character-terminated", or for the other context "null-pointer-terminated", that would be fine. "NUL-terminated" was the unambiguous contraction. (As others have pointed out, a cleverer way to avoid this debate would be to use entirely different terms.) The most common form found in man pages at present installed in NetBSD -current is actually "NUL-terminated", by a significant margin. That's in part because many of those are from third-party projects, e.g., OpenBSD and OpenSSL, which standardized on that form. The next most common is "null-terminated", then (following slightly behind) "nul- terminated", then (much less commonly) "NULL-terminated" (which seems quite incorrect to me). I didn't look as closely at comments, but a similar pattern emerged, with "NUL-terminated" the most common under /usr/include, for example (in part due to the origins of some upstream code). (It's not my intent here to quote or debate exact statistics, so I haven't provided any. I'm sharing my perception of practice, rightly or wrongly.) "nul-terminated" and "null-terminated" seemed more common in man pages that originated from historical BSD sources, so, lacking any style guide, I inferred the lowercase "nul" was more "correct" as "BSD style" (excepting modern OpenBSD), even though that looks a bit odd to me. I then examined where "nul-terminated" came from, and found these bulk commits, which imply a standard. date: 2005-01-02 18:38:04 +; author: wiz; Mark up NULL, and replace null by nul where appropriate. date: 2006-10-16 08:48:45 +; author: wiz; nul/null/NULL cleanup: when talking about characters/bytes, use "nul" and "nul-terminate" when talking about pointers, use "null pointer" or ".Dv NULL" So that seemed to me the established style. Regards, Dave
Re: null-terminated vs. nul-terminated
Taylor R Campbell writes: >> Date: Sat, 26 Mar 2022 16:53:19 +0100 >> From: Roland Illig >> >> The term "null-terminated string" is quite common when talking about C. >> In contrast, the word "nul" in "nul-terminated" always reminds me of >> the character abbreviation in ASCII, which has a narrower scope than C. >> I prefer to keep "null-terminated" here. > > I feel like I've usually seen it as NUL-terminated. I thought it was > in /usr/share/misc/style but I must have been thinking of a different > style guide. > > `NUL' is better than `null' or `NULL' here because it's not a null > pointer, unlike, e.g., the execve argv terminator. Even if the string > isn't US-ASCII, what character encoding calls a nonzero byte `NUL'? > > `NUL' is better than `zero' or `0' here because it's unambiguously the > all-bits-zero byte, not the US-ASCII encoding of `0' (i.e., decimal 48 > or 0x30). For background I'm a native en_US speaker whose second computer language was K C from the pre-ANSI edition. There are three separate concepts. NULLRefers to a pointer, never to a character NUL ASCII codepoint, 7 zero bits, and 8 zero bits when stored in an 8-bit byte. NUL is never properly written nul; the ASCII codepoints are upper case in formal usage. nullAn English word that can mean various things, including null pointer => NULL null character => NUL in ASCII null character => 0 in something else, theoretically maybe, but C just cannot deal with a character set that uses 0 to represent something that gets used in strings. So one can talk about a "null-terminated string" implying "null character" which means NUL, and one could also write "NUL-terminated string". I find the from NUL-terminated to be artificial. I perceive "nul-terminated" as an error due to the lower case nul. signature.asc Description: PGP signature
Re: null-terminated vs. nul-terminated (was: Re: CVS commit: src/lib/libc/gen)
> On Mar 26, 2022, at 9:39 AM, Taylor R Campbell > wrote: > > `C string' is ambiguous because there are also char arrays that > function as strings but which are not guaranteed to be NUL-terminated, > as strncpy is intended for. A non-terminated char array is not a C-string. The term C-string is not ambiguous. This is something that, amazingly, even Internet trolls appear to agree on. However, they do disagree as to the spelling of the terminating character's name, which is why I think it's best to elide it altogether. -- thorpej
Re: null-terminated vs. nul-terminated (was: Re: CVS commit: src/lib/libc/gen)
> On Mar 26, 2022, at 9:09 AM, Warner Losh wrote: > > Since all the 'C' standards[*] use "null-terminated" and "null character", > it's likely best to use that terminology because there is a source of truth > for its definition in case of ambiguity or doubt. Ah, but you're giving up the opportunity to use indirection to solve the problem. By calling it a "C-string", then those who care what the standard calls the terminating character can go look it up! :-) -- thorpej
Re: null-terminated vs. nul-terminated
Am 26.03.2022 um 17:09 schrieb Warner Losh: [*] I've not gone the extra mile and checked to see if K used this phrase, to be honest. It does. The book from 1978 says in its tutorial section: > getline puts the character \0 (the null character, whose value > is zero) at the end of the array it is creating, to mark the > end of the string of characters. Interestingly, the section from the above quote is named "Character Arrays", not "Strings". The definition of "string" on page 181 doesn't mention the word "terminated", it gives the name "null byte" to the \0. So using the word "null" to mean all kinds of nothing, including a null pointer, a null byte and a null character, has a long tradition. Roland
Re: null-terminated vs. nul-terminated (was: Re: CVS commit: src/lib/libc/gen)
On Sat, Mar 26, 2022 at 9:53 AM Roland Illig wrote: > Am 24.03.2022 um 02:55 schrieb David H. Gutteridge: > > Module Name: src > > Committed By: gutteridge > > Date: Thu Mar 24 01:55:15 UTC 2022 > > > > Modified Files: > > src/lib/libc/gen: popen.3 > > > > Log Message: > > popen.3: minor spelling, grammar, style, and xref tweaks > > > > > > To generate a diff of this commit: > > cvs rdiff -u -r1.22 -r1.23 src/lib/libc/gen/popen.3 > > The term "null-terminated string" is quite common when talking about C. > In contrast, the word "nul" in "nul-terminated" always reminds me of > the character abbreviation in ASCII, which has a narrower scope than C. > I prefer to keep "null-terminated" here. > The standard uses "null-terminated" and "null character" (see Character Sets section 5.2.1 (from the C2x draft, but this term dates back to C89): "A byte with all bits set to 0, called the null character, shall exist in the basic execution character set; it is used to terminate a character string." I couldn't find the definition for null-terminated though. This is different than the NULL #define Not to be confused with the all zeros ASCII charater, whose mnemonic is NUL, which is where some pressure to use NUL terminated comes from. I agree that it's usage is narrower and really only relevant for certain ASCII and ASCII-derived character sets, which is why the standard chose the spelling it did. Since all the 'C' standards[*] use "null-terminated" and "null character", it's likely best to use that terminology because there is a source of truth for its definition in case of ambiguity or doubt. Warner [*] I've not gone the extra mile and checked to see if K used this phrase, to be honest.
Re: null-terminated vs. nul-terminated (was: Re: CVS commit: src/lib/libc/gen)
> Date: Sat, 26 Mar 2022 16:53:19 +0100 > From: Roland Illig > > The term "null-terminated string" is quite common when talking about C. > In contrast, the word "nul" in "nul-terminated" always reminds me of > the character abbreviation in ASCII, which has a narrower scope than C. > I prefer to keep "null-terminated" here. I feel like I've usually seen it as NUL-terminated. I thought it was in /usr/share/misc/style but I must have been thinking of a different style guide. `NUL' is better than `null' or `NULL' here because it's not a null pointer, unlike, e.g., the execve argv terminator. Even if the string isn't US-ASCII, what character encoding calls a nonzero byte `NUL'? `NUL' is better than `zero' or `0' here because it's unambiguously the all-bits-zero byte, not the US-ASCII encoding of `0' (i.e., decimal 48 or 0x30). `C string' is ambiguous because there are also char arrays that function as strings but which are not guaranteed to be NUL-terminated, as strncpy is intended for.
Re: null-terminated vs. nul-terminated (was: Re: CVS commit: src/lib/libc/gen)
> On Mar 26, 2022, at 9:17 AM, Martin Husemann wrote: > When talking about it I prefer "zero terminated", or C-string, in > contrast to C++ std::string (which are objects) or Pascal strings > (which have an explicit length at the beginning). Yes, I also prefer the term “C-string" -- thorpej
Re: null-terminated vs. nul-terminated (was: Re: CVS commit: src/lib/libc/gen)
On Sat, Mar 26, 2022 at 04:53:19PM +0100, Roland Illig wrote: > The term "null-terminated string" is quite common when talking about C. NULL terminated lists/array are quite common, but NULL is a pointer and the string is terminated by a 0 char (sometimes spelled as \0 in a string literal, but implicitly added by the compiler at the end of a literal, and spelled as NUL in the ascii table). > I prefer to keep "null-terminated" here. I think it is a bug. When talking about it I prefer "zero terminated", or C-string, in contrast to C++ std::string (which are objects) or Pascal strings (which have an explicit length at the beginning). Martin
null-terminated vs. nul-terminated (was: Re: CVS commit: src/lib/libc/gen)
Am 24.03.2022 um 02:55 schrieb David H. Gutteridge: Module Name:src Committed By: gutteridge Date: Thu Mar 24 01:55:15 UTC 2022 Modified Files: src/lib/libc/gen: popen.3 Log Message: popen.3: minor spelling, grammar, style, and xref tweaks To generate a diff of this commit: cvs rdiff -u -r1.22 -r1.23 src/lib/libc/gen/popen.3 The term "null-terminated string" is quite common when talking about C. In contrast, the word "nul" in "nul-terminated" always reminds me of the character abbreviation in ASCII, which has a narrower scope than C. I prefer to keep "null-terminated" here. Roland