A NOTE has been added to this issue. ====================================================================== https://www.austingroupbugs.net/view.php?id=1649 ====================================================================== Reported By: kre Assigned To: ====================================================================== Project: Issue 8 drafts Issue ID: 1649 Category: Shell and Utilities Type: Error Severity: Objection Priority: normal Status: New Name: Robert Elz Organization: User Reference: Section: XCU 2.6.5 Page Number: 2476 Line Number: 80478 - 80504 Final Accepted Text: ====================================================================== Date Submitted: 2023-03-31 01:55 UTC Last Modified: 2023-09-07 14:54 UTC ====================================================================== Summary: Field splitting is woefully under specified, and in places, simply wrong ======================================================================
---------------------------------------------------------------------- (0006465) kre (reporter) - 2023-09-07 14:54 https://www.austingroupbugs.net/view.php?id=1649#c6465 ---------------------------------------------------------------------- Apologies for the mess with the original version of the results, those reading this via the mailing list will note that I was using angle brackets as the field delimiter characters, to show what is in each field, and totally forgot that mantis would interpret those, so I have just done a quick switch to use square brackets, which works for the note, but is much uglier to look at. Anyway, here is the (now using []) shell strawman implementation of the algorithm in https://www.austingroupbugs.net/view.php?id=1649#c6460 . Again, this is truncated, most of the actual test cases are omitted, though you can deduce what they are from the results in https://www.austingroupbugs.net/view.php?id=1649#c6464 This test does (or did before I fiddled with the "args()" function which prints the results just now, produce identical results to the version run by the shell, in https://www.austingroupbugs.net/view.php?id=1649#c6464 - so I am not going to include those again. I have run this with every reaosnable shell I have (not pdksh, and not zsh, as I don't really understand its differences). All of them (mksh included) produce the same results here, so I believe the code is portable enough. # This is a dummy implementation of the proposed field splitting # algorithm (witten in sh, so hopefully sh people can follow it) # to demonstrate that the algorithm as presented generates the # expected output (that generated by almost every shell). # This code knows that in the tests IFS=' ,' (space and comma) # and rather than handling that generically, which would be possible, # but messy, simply builds those two characters (literally) into the # implementation (space, as a IFS white space char, and comma as an # IFS char that is not white space). # Similarly the code "knows" that if there is a prefix in the field # (chars not to be treated as generated by an expansion, and hence # exepmt fmom splitting) that will be simply a single 'p' always, and # siumilarly a suffix will be 'q' - because of that we do not need to # have any method to indicate what part of the field is to be subject # to field splitting # In the following comments that start '##' are text lifted directly # from my proposed section 2.6.5 ("Field Splitting") text, which might # allow readers to match this algorithm with what is described there. # The results from this test match exactly the results from all shells # considered to operate correctly (the same output routine is used, and # the results compared with diff - with zero differences). S=' ' C=',' field_split() { ARG=$1 # the field that needs to be split set -- # the set of output fields, initially empty # IFS is defined (IFS=' ,') and not empty, IFS white space is ' ' # We simply know that! # C is our candidate field, # CD indicates the delimiter that terminated the candidate field # ' ' indicates the delimiter was IFS white space alone # ',' indicates the delimuter was a ',' (perhaps with white space) # '' indicates there has been no delimiter C= CD= ## Each expansion, or substitution shall be processed in order ## as follows [...] ## While the input is not empty... while test -n "${ARG}" do ## Consider the first remaining character of the input. ## If it is: ## a. A character that did not result from an unquoted ## expansion or substitution: ## b. A character in the input that is not a character in IFS: # since we know exactly what the IFS chars are, and that # chars that did not result from an expandion (etc) are not # IFS chars (our test cases ensure that) we don't need to # treat those two differently, just skip forward until we # get to an IFS char, or we run out, appending the non-IFS # chars to the candidate and removing them from the input. # here we only care about the current first char in ${ARG} while case "${ARG}" in '') break 2 # the end of the input, done ;; [\ ,]*) false # delimiter located, exit loop ;; *) TAIL=${ARG#?} # something else C=${C}${ARG%"${TAIL}"} # appended to candidate ARG=${TAIL} # removed from input ;; esac do : done # Now we are at the start of a delimiter in ARG, and the # candidate field is C # which kind of delimiter do we have? ## c. An IFS white space character: # assume the delim will be just IFS white space (case 'c') CD=' ' # and then skip any of that we find (repeating 'c' over & over) while case "${ARG}" in ' '*) ARG=${ARG#* };; *) false;; esac do :; done ## d. Another IFS character, not IFS white space: # Next if we have a non white space IFS char, # then it is the other kind of delimiter (case 'd' in the algo) case "${ARG}" in ,*) CD=, ; ARG=${ARG#,} # Remember we saw it, then remove # and skip any following IFS white space while case "${ARG}" in ' '*) ARG=${ARG#* };; *) false;; esac do :; done ;; esac # now a field has been delimited so we are subject to: ## At this point, if the candidate is not empty, or if a ## non IFS white space character was seen at step d, then ## the candidate becomes an output field. ## In either case, empty the candidate, and perform the ## next iteration. if test -n "${C}" # candicate is not empty (or...) => output then ## if the candidate is not empty ## then the candidate becomes an output field. set -- "$@" "'${C}'" # otherwise The candidate is empty, if it was delimited # by only IFS white space, then candidate is dropped elif test "${CD}" != ' ' then ## or if a non IFS white space character was seen ## then the candidate becomes an output field. set -- "$@" "''" # no need for $C, it is "" fi ## In either case, empty the candidate, and perform ## the next iteration. CD= C= done ## When the input is empty, if the candidate is not empty, it ## becomes an output field. if test -n "${C}" then # not an empty field after last delim, so it is included set -- "$@" "'${C}'" fi # return the split field, as a list of quoted words (to become fields) printf %s "$*" } args() { name=$1; shift printf '%s:\t%d:\t' "$name" "$#" printf '[%s]' "$@" printf '\n' } tst() { N=$1 eval set -- $(field_split "$2") args "$N" "$@" } W='abc' SW=' abc' WS='abc ' SWS=' abc ' CW=',abc' WC='abc,' CWC=',abc,' WSW='abc def' WSSW='abd def' # and many more definitions like that # followed by the actual test invocations tst W "$W" tst SW "$SW" tst WS "$WS" tst SWS "$SWS" tst CW "$CW" tst WC "$WC" tst CWC "$CWC" tst WSW "$WSW" tst WSSW "$WSSW" tst WCW "$WCW" tst WCCW "$WCCW" tst WSCW "$WSCW" tst WCSW "$WCSW" tst WSCSW "$WSCW" tst WSCSCSW "$WSCSCSW" # and many more. Issue History Date Modified Username Field Change ====================================================================== 2023-03-31 01:55 kre New Issue 2023-03-31 01:55 kre File Added: ifs 2023-03-31 01:55 kre Name => Robert Elz 2023-03-31 01:55 kre Section => XCU 2.6.5 2023-03-31 01:55 kre Page Number => 2476 2023-03-31 01:55 kre Line Number => 80478 - 80504 2023-07-31 16:13 Don Cragun Note Added: 0006412 2023-09-07 14:14 kre Note Added: 0006459 2023-09-07 14:15 kre Note Added: 0006460 2023-09-07 14:30 kre Note Added: 0006462 2023-09-07 14:32 kre Note Added: 0006463 2023-09-07 14:41 kre Note Deleted: 0006463 2023-09-07 14:43 kre Note Edited: 0006462 2023-09-07 14:45 kre Note Added: 0006464 2023-09-07 14:54 kre Note Added: 0006465 ======================================================================