> ---------------------------------------------------------------------- > (0005802) calestyo (reporter) - 2022-04-15 00:41 > https://www.austingroupbugs.net/view.php?id=1560#c5802 > ---------------------------------------------------------------------- > AFAIU, this involves now three types of changes: > > 1) The first one, which improves on the wording of trailing newlines. > => seems good to me. > > > 2) "comprising each character of the IFS" and similar > "The shell shall treat the byte sequence comprising each character of the > IFS as a delimiter" > > It took me a bit to understand what's meant. I would reword this, > especially the "each" is a bit strange here, I think. > > AFAIU, you want to say, that any byte sequence in a word, that equals one > of the characters in IFS is to be taken as a split point. > So isn't that *any* character... not *each* character? > > What about: > "The shell shall treat a byte sequence forming any character of the > characters in the IFS value as a delimiter"
I like this suggestion, although "any character of the characters" is a bit strange. I'll go with "any of the characters". > The same in: > "The term ``IFS white space'' is used to mean any sequence (zero or more > instances) of the byte sequences that comprise white-space characters in > the IFS value (for example, if IFS contains <space>/<comma>/<tab>, any > sequence of bytes that have the encoded values of <space> and <tab> > characters is considered IFS white space)." > > rather something like: > "The term ``IFS white space'' is used to mean any sequence (zero or more > instances) of the byte sequences that form any of the white-space > characters in the IFS value..." Okay. > Perhaps also instead of "is used to mean" just "means". That's what's in the existing text - I didn't feel the need to change that part. > 3) You introduce bytes/byte sequences vs. characters. > > I don't understand why you need that at all? The current wording in terms of characters implies that the word being subjected to field splitting can be treated as a character string. I wanted to ensure that there is no possible way to infer that as being allowed by the new text. > Perhaps it would be better to generally mention that somewhere in the field > splitting chapter? That could invite complaints that it conflicts with the use of "character" elsewhere. > => But there is one thing that's IMO lost on the way: > The old: > " any sequence of <space>, <tab>, or <newline> characters at the > beginning or end of the input shall be ignored and any sequence of those > characters within the input shall delimit a field" > > "sequence of those characters" indicated that a sequence of 1-n IFS > characters were still regarded as one single field splitter. > > With the new: > "ignored and any sequence of such bytes" > that's IMO a bit lost... sequence of bytes is rather considered like ONE > "multi-byte" character. Each of the bytes in question encodes a single-byte character, so it's impossible for them to combine to form one multi-byte character. > You don't have that problem with the 4th change, where you explicitly say: > "any sequence (zero or more instances) of the byte sequences that comprise > white-space characters" I'll insert "(one or more instances)". -- Geoff Clare <[email protected]> The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
