Date:        Fri, 23 Aug 2024 23:47:06 +0200
    From:        Steffen Nurpmeso <stef...@sdaoden.eu>
    Message-ID:  <20240823214706.oskn9OEF@steffen%sdaoden.eu>

  | So IFS whitespace only if part of $IFS.

That is the definition of IFS whitespace.

  | So this "adjacent" even if *not* part of $IFS.

No, only characters that are in IFS are ever delimiters (really terminators).

  | So this means that *regardless* of whatever $IFS is, the three IFS
  | whitespace characters are $IFS anyway *if* that is set to
  | a nin-empty non-default value.

No.   Only if they are in IFS.   If we have IFS=': ' then colon and
space are IFS characters, space is IFS whitespace, and tab and newline
are simply characters.

What is important about space (0x20) tab (0x09) and newline (0x0a)
is that if they appear in IFS, they are IFS whitespace.   Whether
other characters for which isspace() might return true (or the wide
equivalent thereof where appropriate) are IFS whitespace or not
is implementation defined (and usually, not).

Since you're clearly looking at the new (Issue 8) standard, look
at the 6th paragraph of XCU 2.6.5 which starts "For the purposes
of this section,..." and goes on to define exactly what the
term "IFS whitespace" means.

  |  If the  value  of IFS is null, no word splitting occurs.

Correct.

  | I have to say i still have a lot of problems wrapping my head
  | against the term

Almost everyone has problems understanding how field splitting
really works.   It is odd (historically odd).

  | It seems to me, now, that the actual point here is that IFS
  | whitespace can give no empty output in say a IFS=: case, whereas
  | the colon in $IFS *can* create empty output tokens.

First, fields, not tokens.   Tokens are what passes from lexical
analysis to the parser (and IFS has nothing whatever to do with that)
- there is no parsing happening here (doing field splitting).

But Yes.   With F=foo::bar and G='foo  bar' (two spaces) then
with IFS=' :' (the order only matters when expanding "$*")
argc $F gives 3 and argc $G gives 2.   [argc() is just argc() { echo $#; }]

If H were ' foo:  : bar: ' argc $H would be 3  ("foo" "" and "bar").

kre

ps: the hope was that the text that is now in 2.6.5 would finally be
explicit enough that it would be possible to read that, completely,
and then implement it properly.



Reply via email to