Re: awk: FS matching 0 or more characters
2020-02-03 15:10:29 -0800, Don Cragun: [...] > "The search for a matching sequence starts at the beginning > of a string and stops when the first sequence matching the > * ``begins earliest in the string’’. If the pattern permits > * a variable number of matching characters and thus there is > * more than one such sequence starting at that point, the > * longest such sequence is matched. For example, the BRE > "bb*" matches the second to fourth characters of the > string "abbbc", and the ERE "(wee|week)(knights|night)" > matches all ten characters of the string "weeknights". > > * "Consistent with the whole match being the longest of the > * leftmost matches, each subpattern, from left to right, > * shall match the longest possible string. For this purpose, > * a null string shall be considered to be longer than no > * match at all. For example, matching the BRE "\(.*\).*" > * against "abcdef", the subexpression "(\1)" is "abcdef", > * and matching the BRE "\(a*\)*" against "bc", the > * subexpression "(\1)" is the null string. > > "When a multi-character collating element in a bracket > expression (see Section 9.3.5, on page 184) is involved, > the longest sequence shall be measured in characters > consumed from the string to be matched; that is, the > collating element counts not as one element, but as the > number of characters it matches." > > Noting the part of this definition that is on lines shown above > with a leading asterisk, I believe the standard is clear and > that Busybox awk does not conform. [...] Not sure how you reached that conclusion. It seems to me on the contrary that that text alone would mean that busybox awk is the only compliant implementation. When it comes to sed or grep, all implementations agree with busybox awk. $ echo bbb | gsed 's/a*/<&>/g' <>b<>b<>b<> $ echo bbb | busybox sed 's/a*/<&>/g' <>b<>b<>b<> $ echo bbb | solaris-sed 's/a*/<&>/g' <>b<>b<>b<> $ echo bbb | solaris-xpg4-sed 's/a*/<&>/g' <>b<>b<>b<> $ echo aaa | grep 'b*' aaa The special behaviour of the original awk, mawk or gawk AFAICT is a non-documented (AFAICT) deviation and seems to only apply to FS processing (and split()). sub(), gsub(), match, /.../ will happily match an empty string. $ echo bbb | /usr/xpg4/bin/awk '{gsub(/a*/, "<&>"); print}' <>b<>b<>b<> $ echo bbb | gawk '{gsub(/a*/, "<&>"); print}' <>b<>b<>b<> $ echo bbb | mawk '{gsub(/a*/, "<&>"); print}' <>b<>b<>b<> $ echo bbb | gawk '/a*/' bbb To account for those implementations, POSIX should say that when the split regexp matches an empty string, it's undefined whether that empty string is taken as a field separator or ignored (and in any case, matching resumes at the next character, not at the end of the matched text otherwise it would loop indefinitely (like ast-open's grep -o 'a*' does)). -- Stephane
Re: awk: FS matching 0 or more characters
Hi Martijn, In the description of REs in the standard, "match" is described (on P181-182, L5969-5993 in the 2017 edition of the standar) as: "A sequence of zero or more characters shall be said to be matched by a BRE or ERE when the characters in the sequence correspond to a sequence of characters defined by the pattern. "Matching shall be based on the bit pattern used for encoding the character, not on the graphic representation of the character. This means that if a character set contains two or more encodings for a graphic symbol, or if the strings searched contain text encoded in more than one codeset, no attempt is made to search for any other representation of the encoded symbol. If that is required, the user can specify equivalence classes containing all variations of the desired graphic symbol. "The search for a matching sequence starts at the beginning of a string and stops when the first sequence matching the * ``begins earliest in the string’’. If the pattern permits * a variable number of matching characters and thus there is * more than one such sequence starting at that point, the * longest such sequence is matched. For example, the BRE "bb*" matches the second to fourth characters of the string "abbbc", and the ERE "(wee|week)(knights|night)" matches all ten characters of the string "weeknights". * "Consistent with the whole match being the longest of the * leftmost matches, each subpattern, from left to right, * shall match the longest possible string. For this purpose, * a null string shall be considered to be longer than no * match at all. For example, matching the BRE "\(.*\).*" * against "abcdef", the subexpression "(\1)" is "abcdef", * and matching the BRE "\(a*\)*" against "bc", the * subexpression "(\1)" is the null string. "When a multi-character collating element in a bracket expression (see Section 9.3.5, on page 184) is involved, the longest sequence shall be measured in characters consumed from the string to be matched; that is, the collating element counts not as one element, but as the number of characters it matches." Noting the part of this definition that is on lines shown above with a leading asterisk, I believe the standard is clear and that Busybox awk does not conform. Hope this helps, Don > On Feb 3, 2020, at 1:50 PM, Martijn Dekker wrote: > > Consider: > > echo 'one!two!!three!!!end' | awk -v 'FS=!*' \ > '{ for (i=NF; i>0; i--) print $i; }' > > Onetrueawk, mawk, GNU awk, and Solaris awk all print: > >> end >> three >> two >> one > > However, Busybox awk prints: > >> d >> n >> e >> e >> e >> r >> h >> t >> o >> w >> t >> e >> n >> o > > In a way, the Busybox awk behaviour makes more sense. The "!*" ERE means: > match zero or more "!", and that's exactly what it did. > > Changing the ERE to '!+' makes all awks behave consistently, so that's the > obvious fix. > > But what, if anything, does POSIX have to say about an FS ERE matching zero > or more characters? > > https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html#tag_20_06_13_04 > > I can only find: > >> 1. If FS is a null string, the behavior is unspecified. > > That doesn't really apply; FS is a non-null ERE, though one that may match > the null string. > >> 3. [...] Each occurrence of a sequence matching the extended regular >> expression shall delimit fields. > > Is a null string matching the ERE a "sequence" that matches it? > > So at this point I'm not sure whether to report a bug in Busybox awk, or an > area in the standard that needs further specification or clarification, or > neither... > > - Martijn > > -- > modernish -- harness the shell > https://github.com/modernish/modernish >
Re: [1003.1(2008)/Issue 7 0000252]: dot should follow Utility Syntax Guidelines
Robert Elz wrote in <28486.1580735...@jinx.noi.kre.to>: |Date:Fri, 31 Jan 2020 23:18:31 +0100 |From:Steffen Nurpmeso |Message-ID: <20200131221831.vapcz%stef...@sdaoden.eu> | || May i ask whether you have numbers on how often "." is used with || $PATH searching? | |Of course you may ask. | |But to save you actually doing that, no, I have no idea, like many |other things in this area all that we really know is that it might |be, as that's how it was specified to work - even if none of us |usually ever use it that way. |But the relevant change here was identical for both "." and "exec" |and "exec" using PATH search is rather more likely. | || But, then: for explicit relative || file names, using ./ is a way to accomplish escaping. | |Of course, but if that were adequate for the original issue, the |reported problem would not have needed a change at all. That's |a different kettle of fish. I'm assuming (without any personal |evidence of it) that there actually is/was a problem to fix, and |then simply asking about whether the fix that was made perhaps over |does things a little (requires more than was strictly necessary). I see, after having read the issue, it was from 2010. I use functions which do the path search for me, you and Stephane have fixed bugs in it, as you possibly remember. This is because command -v is not really usable. So i, among others, have found ways to workaround issues. I have found only one script which does './exec "$shellvar"' where $shellvar is not a readily prepared path, and that is in a release script which is known to run locally. But yes, of course, having -- is an improvement. --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Re: raise(0) (was: Exit status 128)
No, they won't laugh. If the intent was to exclude a "normal zero", as 6.2.6.2p3 refers to it, the text "positive non-zero value" is explicitly required. Since it isn't there, which I'm fine with, maybe someone else should file a defect report with them. On Monday, February 3, 2020 Geoff Clare wrote: Shware Systems wrote, on 03 Feb 2020: > >> C99 only specifies the behaviour of raise() and signal() for SIGABRT, >> SIGFPE, SIGILL, SIGINT, SIGSEGV, and SIGTERM. The behaviour for all >> other "sig" argument values is either implementation-defined or undefined. >> 7.14 para 4 says "The complete set of signals, their semantics, and >> their default handling is implementation-defined; all signal numbers >> shall be positive." This means that "the complete set of signals" >> for which an implementation defines the behaviour cannot include 0, >> because 0 is not positive (see below). Thus the behaviour for 0 is >> not implementation-defined and must therefore be undefined. >> Yes, 0 is not "positive". When used on its own, "positive" means >> "greater than zero", ... > [Zero] has the same sign bit value as other positive values so is > positive. > As such, 7.14 para 3 doesn't even preclude one of the required > signals from being assigned to zero, it just says, via "which expand > to positive integer constant expressions with type int and distinct > values", if one uses it the others can't. I suggest you try making that claim to the C committee and give them a good laugh. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
awk: FS matching 0 or more characters
Consider: echo 'one!two!!three!!!end' | awk -v 'FS=!*' \ '{ for (i=NF; i>0; i--) print $i; }' Onetrueawk, mawk, GNU awk, and Solaris awk all print: end three two one However, Busybox awk prints: d n e e e r h t o w t e n o In a way, the Busybox awk behaviour makes more sense. The "!*" ERE means: match zero or more "!", and that's exactly what it did. Changing the ERE to '!+' makes all awks behave consistently, so that's the obvious fix. But what, if anything, does POSIX have to say about an FS ERE matching zero or more characters? https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html#tag_20_06_13_04 I can only find: 1. If FS is a null string, the behavior is unspecified. That doesn't really apply; FS is a non-null ERE, though one that may match the null string. 3. [...] Each occurrence of a sequence matching the extended regular expression shall delimit fields. Is a null string matching the ERE a "sequence" that matches it? So at this point I'm not sure whether to report a bug in Busybox awk, or an area in the standard that needs further specification or clarification, or neither... - Martijn -- modernish -- harness the shell https://github.com/modernish/modernish
Interpretations starting a 30 day review
All Please note the following interpretations are starting a 30 day review. Comments back please no later than March 6 2020. 0001307: Base Definitions and Headers am_pm value in locales that do not distinguish between am and pm (again) 0001309: Shell and Utilities Clarity needed for initial value of $? at start of compound-list compound statements regards Andrew Andrew JoseyThe Open Group Austin Group Chair Email: a.jo...@opengroup.org Apex Plaza, Forbury Road,Reading,Berks.RG1 1AX,England To learn how we maintain your privacy, please review The Open Group Privacy Statement at http://www.opengroup.org/privacy. To unsubscribe/opt-out from this mailing list login to The Open Group collaboration portal at https://collaboration.opengroup.org/operational/portal.php?action=unsub&listid=2481
[1003.1(2016)/Issue7+TC2 0001307]: am_pm value in locales that do not distinguish between am and pm (again)
The following issue has been UPDATED. == https://www.austingroupbugs.net/view.php?id=1307 == Reported By:geoffclare Assigned To: == Project:1003.1(2016)/Issue7+TC2 Issue ID: 1307 Category: Base Definitions and Headers Type: Clarification Requested Severity: Comment Priority: normal Status: Interpretation Required Name: Geoff Clare Organization: The Open Group User Reference: Section:7.3.5.1 LC_TIME Locale Definition Page Number:160 Line Number:5085 Interp Status: Proposed Final Accepted Text:See https://www.austingroupbugs.net/view.php?id=1307#c4762. == Date Submitted: 2019-12-18 15:35 UTC Last Modified: 2020-02-03 21:14 UTC == Summary:am_pm value in locales that do not distinguish between am and pm (again) == Relationships ID Summary -- related to 081 am_pm value in locales that do not dist... child of466 date +%C problem == -- (0004767) ajosey (manager) - 2020-02-03 21:14 https://www.austingroupbugs.net/view.php?id=1307#c4767 -- Interpretation Proposed: 3 February 2020 Issue History Date ModifiedUsername FieldChange == 2019-12-18 15:35 geoffclare New Issue 2019-12-18 15:35 geoffclare Name => Geoff Clare 2019-12-18 15:35 geoffclare Organization => The Open Group 2019-12-18 15:35 geoffclare Section => 7.3.5.1 LC_TIME Locale Definition 2019-12-18 15:35 geoffclare Page Number => 160 2019-12-18 15:35 geoffclare Line Number => 5085 2019-12-18 15:35 geoffclare Interp Status => --- 2019-12-18 15:36 geoffclare Relationship added related to 081 2019-12-18 15:38 geoffclare Note Added: 0004688 2019-12-18 18:54 shware_systems Note Added: 0004689 2019-12-18 19:08 shware_systems Note Edited: 0004689 2020-01-30 16:57 Don Cragun Note Added: 0004762 2020-01-30 16:58 eblake Relationship added child of 466 2020-01-30 16:59 Don Cragun Interp Status--- => Pending 2020-01-30 16:59 Don Cragun Final Accepted Text => See https://www.austingroupbugs.net/view.php?id=1307#c4762. 2020-01-30 16:59 Don Cragun Status New => Interpretation Required 2020-01-30 16:59 Don Cragun Resolution Open => Accepted As Marked 2020-01-30 16:59 Don Cragun Tag Attached: issue8 2020-02-03 21:14 ajosey Interp StatusPending => Proposed 2020-02-03 21:14 ajosey Note Added: 0004767 ==
[1003.1(2016)/Issue7+TC2 0001309]: Clarity needed for initial value of $? at start of compound-list compound statements
The following issue has been UPDATED. == https://www.austingroupbugs.net/view.php?id=1309 == Reported By:kre Assigned To: == Project:1003.1(2016)/Issue7+TC2 Issue ID: 1309 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Interpretation Required Name: Robert Elz Organization: User Reference: Section:2.9.4 Page Number:2371-4 Line Number:75726-31 Interp Status: Proposed Final Accepted Text: https://www.austingroupbugs.net/view.php?id=1309#c4763 == Date Submitted: 2019-12-19 02:26 UTC Last Modified: 2020-02-03 21:13 UTC == Summary:Clarity needed for initial value of $? at start of compound-list compound statements == Relationships ID Summary -- related to 0001150 exit status of command substitution not... related to 051 sh exit status not clear for built-in t... == -- (0004766) ajosey (manager) - 2020-02-03 21:13 https://www.austingroupbugs.net/view.php?id=1309#c4766 -- Interpretation Proposed: 3 February 2020 Issue History Date ModifiedUsername FieldChange == 2019-12-19 02:26 kreNew Issue 2019-12-19 02:26 kreName => Robert Elz 2019-12-19 02:26 kreSection => 2.9.4 2019-12-19 02:26 krePage Number => 2371-4 2019-12-19 02:26 kreLine Number => 75726-31 2020-01-16 17:42 geoffclare Note Added: 0004731 2020-01-16 17:43 geoffclare Note Edited: 0004731 2020-01-16 20:35 kreNote Added: 0004732 2020-01-16 21:36 kreNote Added: 0004733 2020-01-17 04:17 kreNote Added: 0004734 2020-01-17 04:19 kreNote Edited: 0004734 2020-01-17 09:56 joerg Note Added: 0004735 2020-01-17 10:31 kreNote Added: 0004736 2020-01-17 15:39 geoffclare Note Added: 0004737 2020-01-17 15:53 joerg Note Added: 0004738 2020-01-17 15:56 joerg Note Edited: 0004738 2020-01-17 16:04 joerg Note Edited: 0004738 2020-01-17 16:17 geoffclare Note Edited: 0004737 2020-01-18 02:38 kreNote Added: 0004739 2020-01-20 11:57 geoffclare Note Added: 0004741 2020-01-20 14:54 geoffclare Note Added: 0004742 2020-01-20 14:55 geoffclare Note Edited: 0004742 2020-01-20 14:58 geoffclare Relationship added related to 0001150 2020-01-20 15:04 geoffclare Relationship added related to 051 2020-01-20 18:37 kreNote Added: 0004743 2020-01-23 14:55 geoffclare Note Added: 0004744 2020-01-23 14:56 geoffclare Note Edited: 0004744 2020-01-23 14:59 geoffclare Note Edited: 0004744 2020-01-23 15:01 geoffclare Note Edited: 0004744 2020-01-30 17:20 geoffclare Note Added: 0004763 2020-01-30 17:21 geoffclare Interp Status => Pending 2020-01-30 17:21 geoffclare Final Accepted Text => https://www.austingroupbugs.net/view.php?id=1309#c4763 2020-01-30 17:21 geoffclare Status New => Interpretation Required 2020-01-30 17:21 geoffclare Resolution Open => Accepted As Marked 2020-01-30 17:21 geoffclare Tag Attached: issue8 2
Re: Solaris /usr/xpg4/bin/sh builtin handling (Was: About printf %2$s)
2020-02-03 14:32:57 +0100, casper@oracle.com: [...] > Right. I think it may need some fine tuning but I think it is fine to > avoid the shell when it is not needed. Yes, at least (beside what's already done): - that optimisation must be disabled if the first word is a builtin, special builtin, builtin alias or function, or keyword of the corresponding shell. (beware that for /bin/sh (ksh93), the list of builtins depends on $PATH (if /opt/ast/bin is in front of $PATH, a few more builtins are enabled) $ PATH=/opt/ast/bin:$PATH sh -c 'builtin;alias' | wc -l 82 (plus the keywords) - it must be disabled if the code argument starts with - or + - if the value of the $SHELL environment variable starts with r. There may be other env variables (like _AST_FEATURES) that affect the way the shell parses and runs simple commands. > I was not aware that ksh was all that dangerous; especially as it allows > crossing privilege boundaries using environment variable. It's not limited to ksh. In all shells, you mustn't use unsanitized data in arithmetic expressions. Some shells are worse than others. In dash for instance, the exposure is limitted to $(($var)) / $(($1)), and the damage is limited to assigning variables (var=PATH=7734). $((var)) there is OK (anything other than octal, hex or decimal constants with optional -/+ sign and blanks triggers an error). > Not quite as bad as "Shellshock"; not even close. Still another reason to > avoid the shell when it not actually needed to start a new command. The vulnerability in this case is not in the shell, but in the scripts using that feature (if they forget to sanitize data before using in arithmetic context). The feature could be seen as a misfeature though as it makes it difficult to write safe shell code. That can't be fully fixed though as long as $(($1)) is required (by POSIX) to evaluate the arithmetic expression stored in the first positional parameter. > I'm not sure why we ended up in Solaris with 18 commands which are > basically built-in ksh93 commands that make little sense as individual > executables: > > aliascd fc getopts jobs printtest ulimit > unalias > bg command fg hash kill read type umaskwait > > It seems that is being tested in XPG4.os/procenv/confstr/ > > The only ones that makes sense are "kill" & "print". [...] Except for "print", that's a POSIX requirement (which many systems ignore) as non-special builtins have to be available as standalone commands (at least for exec*p(), env, find -exec, and all the commands that can execute commands). -- Stephane
[1003.1(2016)/Issue7+TC2 0001313]: Underline tags in strftime Application Usage
The following issue has been RESOLVED. == https://austingroupbugs.net/view.php?id=1313 == Reported By:dennisw Assigned To: == Project:1003.1(2016)/Issue7+TC2 Issue ID: 1313 Category: System Interfaces Type: Error Severity: Editorial Priority: normal Status: Resolved Name: Dennis Wölfing Organization: User Reference: Section:strftime Page Number:2049 Line Number:65729 Interp Status: --- Final Accepted Text:See https://austingroupbugs.net/view.php?id=1313#c4765. Resolution: Accepted As Marked Fixed in Version: == Date Submitted: 2020-01-02 13:57 UTC Last Modified: 2020-02-03 16:31 UTC == Summary:Underline tags in strftime Application Usage == Issue History Date ModifiedUsername FieldChange == 2020-01-02 13:57 denniswNew Issue 2020-01-02 13:57 denniswName => Dennis Wölfing 2020-01-02 13:57 denniswSection => strftime 2020-01-02 13:57 denniswPage Number => 2049 2020-01-02 13:57 denniswLine Number => 65729 2020-02-03 16:30 Don Cragun Note Added: 0004765 2020-02-03 16:31 Don Cragun Interp Status => --- 2020-02-03 16:31 Don Cragun Final Accepted Text => See https://austingroupbugs.net/view.php?id=1313#c4765. 2020-02-03 16:31 Don Cragun Status New => Resolved 2020-02-03 16:31 Don Cragun Resolution Open => Accepted As Marked ==
[1003.1(2016)/Issue7+TC2 0001313]: Underline tags in strftime Application Usage
A NOTE has been added to this issue. == https://austingroupbugs.net/view.php?id=1313 == Reported By:dennisw Assigned To: == Project:1003.1(2016)/Issue7+TC2 Issue ID: 1313 Category: System Interfaces Type: Error Severity: Editorial Priority: normal Status: New Name: Dennis Wölfing Organization: User Reference: Section:strftime Page Number:2049 Line Number:65729 Interp Status: --- Final Accepted Text: == Date Submitted: 2020-01-02 13:57 UTC Last Modified: 2020-02-03 16:30 UTC == Summary:Underline tags in strftime Application Usage == -- (0004765) Don Cragun (manager) - 2020-02-03 16:30 https://austingroupbugs.net/view.php?id=1313#c4765 -- On page 2049 line 65729 section strftime, change: (<+/->Y-MM-DD) to: (<+/->Y-MM-DD, i.e. with a 5 or more digit year) Issue History Date ModifiedUsername FieldChange == 2020-01-02 13:57 denniswNew Issue 2020-01-02 13:57 denniswName => Dennis Wölfing 2020-01-02 13:57 denniswSection => strftime 2020-01-02 13:57 denniswPage Number => 2049 2020-01-02 13:57 denniswLine Number => 65729 2020-02-03 16:30 Don Cragun Note Added: 0004765 ==
[1003.1(2016)/Issue7+TC2 0001311]: j command incorrectly referred to in ed's rationale section
The following issue has been RESOLVED. == https://austingroupbugs.net/view.php?id=1311 == Reported By:andras_farkas Assigned To: == Project:1003.1(2016)/Issue7+TC2 Issue ID: 1311 Category: Shell and Utilities Type: Error Severity: Editorial Priority: normal Status: Resolved Name: Andras Farkas Organization: User Reference: Section:ed Page Number:2689 Line Number:87741 Interp Status: --- Final Accepted Text: Resolution: Accepted Fixed in Version: == Date Submitted: 2019-12-20 10:09 UTC Last Modified: 2020-02-03 16:09 UTC == Summary:j command incorrectly referred to in ed's rationale section == Issue History Date ModifiedUsername FieldChange == 2019-12-20 10:09 andras_farkas New Issue 2019-12-20 10:09 andras_farkas Name => Andras Farkas 2019-12-20 10:09 andras_farkas Section => ed 2019-12-20 10:09 andras_farkas Page Number => ed 2019-12-20 10:09 andras_farkas Line Number => 1200 2019-12-20 10:11 andras_farkas Note Added: 0004694 2019-12-20 10:27 geoffclare Page Number ed => 2689 2019-12-20 10:27 geoffclare Line Number 1200 => 87741 2019-12-20 10:27 geoffclare Interp Status => --- 2019-12-20 10:27 geoffclare Description Updated 2019-12-20 10:27 geoffclare Desired Action Updated 2019-12-20 10:27 shware_systems Note Added: 0004695 2019-12-20 10:29 geoffclare Note Added: 0004696 2019-12-20 10:29 geoffclare Description Updated 2019-12-20 10:30 geoffclare Note Edited: 0004696 2019-12-20 10:33 andras_farkas Note Added: 0004697 2020-01-12 01:46 andras_farkas Note Added: 0004720 2020-02-03 16:09 Don Cragun Status New => Resolved 2020-02-03 16:09 Don Cragun Resolution Open => Accepted ==
[1003.1(2016)/Issue7+TC2 0001312]: ctags -v example in ctags's rationale section missing a newline
The following issue has been RESOLVED. == https://austingroupbugs.net/view.php?id=1312 == Reported By:andras_farkas Assigned To: == Project:1003.1(2016)/Issue7+TC2 Issue ID: 1312 Category: Shell and Utilities Type: Error Severity: Editorial Priority: normal Status: Resolved Name: Andras Farkas Organization: User Reference: Section:ctags Page Number:2625 Line Number:85386 Interp Status: --- Final Accepted Text: Resolution: Accepted Fixed in Version: == Date Submitted: 2019-12-20 10:49 UTC Last Modified: 2020-02-03 16:12 UTC == Summary:ctags -v example in ctags's rationale section missing a newline == Issue History Date ModifiedUsername FieldChange == 2019-12-20 10:49 andras_farkas New Issue 2019-12-20 10:49 andras_farkas Name => Andras Farkas 2019-12-20 10:49 andras_farkas Section => ctags 2020-01-12 01:46 andras_farkas Note Added: 0004721 2020-02-03 16:12 geoffclare Page Number => 2625 2020-02-03 16:12 geoffclare Line Number => 85386 2020-02-03 16:12 geoffclare Interp Status => --- 2020-02-03 16:12 geoffclare Status New => Resolved 2020-02-03 16:12 geoffclare Resolution Open => Accepted ==
Re: Solaris /usr/xpg4/bin/sh builtin handling (Was: About printf %2$s)
>"casper@oracle.com" wrote: >> The only ones that makes sense are "kill" & "print". > >I would say that "print" is not needed since it is not required to be callable >via exec(), since it is a ksh88/ksh93 private builtin. Right "print" is not tested for in the test suite. Casper
Re: Solaris /usr/xpg4/bin/sh builtin handling (Was: About printf %2$s)
"casper@oracle.com" wrote: > I'm not sure why we ended up in Solaris with 18 commands which are > basically built-in ksh93 commands that make little sense as individual > executables: > > aliascd fc getopts jobs printtest ulimit > unalias > bg command fg hash kill read type umaskwait > > It seems that is being tested in XPG4.os/procenv/confstr/ > > The only ones that makes sense are "kill" & "print". I would say that "print" is not needed since it is not required to be callable via exec(), since it is a ksh88/ksh93 private builtin. Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: Solaris /usr/xpg4/bin/sh builtin handling (Was: About printf %2$s)
>2020-02-03 12:40:45 +0100, Joerg Schilling: >[...] >> > It looks like it's caused by an "optimisation" in its >> > libc:exec*(), so /usr/xpg4/bin/sh and POSIX are not to blame >> > after all. >> >> To which Solaris version does this apply? > >That was 11.4 Yes. >> > $ ksh -c 'printf %d 1+1' >> > printf: 1+1 not completely converted >> >> This is the correct expected output for /usr/bin/printf > >Yes, that's the point, /usr/bin/printf was called instead of ksh >(ksh93 here) and its builtin. > >> > What? ksh's printf does take arithmetic expressions as arguments >> > for %d. >> > >> > $ ksh -c 'printf %d 1+1;' >> > 2 >> > $ ksh -c 'printf %d 1+1' ksh >> > 2 >> > >> > Adding that ; special shell character or an extra argument >> > disables the optimisation. >> >> But this seems to be an easteregg from ksh93. >[...] > >printf %d 1+1 to output 2 is expected in ksh where in most >places where a number is expected, any arithmetic expression is >accepted as well. That behaviour was also copied by zsh. > >It causes all sorts of security headaches as arithmetic expressions can assign >variables (like for IFS=1234567890, PATH=7734) or run arbitrary code (like >a=[$(evil)0]) > >$ a=2 b='a[$(evil)0]' ksh -c 'printf %d b' # /usr/bin/printf run >printf: b expected numeric value >$ a=2 b='a[$(evil)0]' ksh -c 'printf "%d" b' # ksh printf run >ksh: printf: evil: not found [No such file or directory] > >The easteregg here is more solaris libc:exec*() bypassing the >execution of a shell in some cases. Right. I think it may need some sine tuning but I think it is fine to avoid the shell when it is not needed. I was not aware that ksh was all that dangerous; especially as it allows crossing privilege boundaries using environment variable. Not quite as bad as "Shellshock"; not even close. Still another reason to avoid the shell when it not actually needed to start a new command. I'm not sure why we ended up in Solaris with 18 commands which are basically built-in ksh93 commands that make little sense as individual executables: aliascd fc getopts jobs printtest ulimit unalias bg command fg hash kill read type umaskwait It seems that is being tested in XPG4.os/procenv/confstr/ The only ones that makes sense are "kill" & "print". Casper
Re: [1003.1(2008)/Issue 7 0000252]: dot should follow Utility Syntax Guidelines
Date:Fri, 31 Jan 2020 23:18:31 +0100 From:Steffen Nurpmeso Message-ID: <20200131221831.vapcz%stef...@sdaoden.eu> | May i ask whether you have numbers on how often "." is used with | $PATH searching? Of course you may ask. But to save you actually doing that, no, I have no idea, like many other things in this area all that we really know is that it might be, as that's how it was specified to work - even if none of us usually ever use it that way. But the relevant change here was identical for both "." and "exec" and "exec" using PATH search is rather more likely. | But, then: for explicit relative | file names, using ./ is a way to accomplish escaping. Of course, but if that were adequate for the original issue, the reported problem would not have needed a change at all. That's a different kettle of fish. I'm assuming (without any personal evidence of it) that there actually is/was a problem to fix, and then simply asking about whether the fix that was made perhaps over does things a little (requires more than was strictly necessary). kre
Re: About printf %2$s (Was: Coordination on standardizing gettext() in future POSIX)
Stephane CHAZELAS wrote: > (what's "whatwhell" by the way? or do you mean Scan Mascheck's > "whatshell"?) Correct, sticky fingers ;-) Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: raise(0) (was: Exit status 128)
Shware Systems wrote, on 03 Feb 2020: > >> C99 only specifies the behaviour of raise() and signal() for SIGABRT, >> SIGFPE, SIGILL, SIGINT, SIGSEGV, and SIGTERM. The behaviour for all >> other "sig" argument values is either implementation-defined or undefined. >> 7.14 para 4 says "The complete set of signals, their semantics, and >> their default handling is implementation-defined; all signal numbers >> shall be positive." This means that "the complete set of signals" >> for which an implementation defines the behaviour cannot include 0, >> because 0 is not positive (see below). Thus the behaviour for 0 is >> not implementation-defined and must therefore be undefined. >> Yes, 0 is not "positive". When used on its own, "positive" means >> "greater than zero", ... > [Zero] has the same sign bit value as other positive values so is > positive. > As such, 7.14 para 3 doesn't even preclude one of the required > signals from being assigned to zero, it just says, via "which expand > to positive integer constant expressions with type int and distinct > values", if one uses it the others can't. I suggest you try making that claim to the C committee and give them a good laugh. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: About printf %2$s
Joerg Schilling wrote, on 03 Feb 2020: > > > I don't know where the %2$x format in printf(3) comes from. > > Well, from my private history memory, I have in mind that Sun introduced it > in the 1980s, when the basics for gettext(3) have been created. So this must > have been no later than for SunOS-4.0. I believe, there was a related talk on > a Sun User Group meeting that time with examples for printing date strings. In XPG2 there was a separate nl_printf() function which handled this format (and only this format - you had to call printf() to use %x and nl_printf() to use %2$x). It was in the same "NLS" (Native Language Support) section of the spec as nl_init(), nl_langinfo() and catopen(). XPG2 was published in Jan 1987, so the work on specifying nl_printf() would have been done in the year or so before that. I don't know if it was based on an existing implementation. The functionality from nl_printf() was merged into printf() in XPG3. (And nl_init() was dropped in favour of the new setlocale() function from the draft C standard.) -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: Solaris /usr/xpg4/bin/sh builtin handling (Was: About printf %2$s)
2020-02-03 12:40:45 +0100, Joerg Schilling: [...] > > It looks like it's caused by an "optimisation" in its > > libc:exec*(), so /usr/xpg4/bin/sh and POSIX are not to blame > > after all. > > To which Solaris version does this apply? That was 11.4 > > $ ksh -c 'printf %d 1+1' > > printf: 1+1 not completely converted > > This is the correct expected output for /usr/bin/printf Yes, that's the point, /usr/bin/printf was called instead of ksh (ksh93 here) and its builtin. > > What? ksh's printf does take arithmetic expressions as arguments > > for %d. > > > > $ ksh -c 'printf %d 1+1;' > > 2 > > $ ksh -c 'printf %d 1+1' ksh > > 2 > > > > Adding that ; special shell character or an extra argument > > disables the optimisation. > > But this seems to be an easteregg from ksh93. [...] printf %d 1+1 to output 2 is expected in ksh where in most places where a number is expected, any arithmetic expression is accepted as well. That behaviour was also copied by zsh. It causes all sorts of security headaches as arithmetic expressions can assign variables (like for IFS=1234567890, PATH=7734) or run arbitrary code (like a=[$(evil)0]) $ a=2 b='a[$(evil)0]' ksh -c 'printf %d b' # /usr/bin/printf run printf: b expected numeric value $ a=2 b='a[$(evil)0]' ksh -c 'printf "%d" b' # ksh printf run ksh: printf: evil: not found [No such file or directory] The easteregg here is more solaris libc:exec*() bypassing the execution of a shell in some cases. -- Stephane
Re: About printf %2$s (Was: Coordination on standardizing gettext() in future POSIX)
2020-02-03 11:43:38 +0100, Joerg Schilling: [...] > > $ /usr/xpg4/bin/sh -c 'type printf' > > printf is a shell builtin > > This does not apply to OpenSolaris, but on OpenSolaris, this was closed > source > as ksh88 is not available under OSS license. > > This also does not apply to Oracle Solaris 11.3, so where did you test? > > Could you run "whatwhell" with this shell please? [...] That was Solaris 11.4 in a VM as freshly downloaded from Oracle. Yes, it's based on ksh88, but note that as seen in my follow-up messages on austin-group-l (https://www.mail-archive.com/austin-group-l@opengroup.org/msg05548.html, https://www.mail-archive.com/austin-group-l@opengroup.org/msg05549.html, I had reduced the distribution by then as I don't expect it's of much interest to GNU gettext), the system actually didn't run /usr/xpg4/bin/sh at all in that case but ran /usr/bin/type printf instead, where /usr/bin/type appears to be some special build of ksh93. Which explains why it said printf was builtin. If you don't get the same, then possibly that (undocumented AFAICT) bypassing of sh is a new feature in 11.4 or you have /usr/xpg4/bin ahead of /bin and/or /usr/bin in your $PATH, in which case /usr/xpg4/bin/type is called instead. (what's "whatwhell" by the way? or do you mean Scan Mascheck's "whatshell"?) -- Stephane
Re: Solaris /usr/xpg4/bin/sh builtin handling (Was: About printf %2$s)
Stephane Chazelas wrote: > 2020-02-01 10:47:46 +, Stephane Chazelas: > [...] > > That doesn't explain why it's different with ${0+type} or when > > there's more than the one invocation of "type" in the script. > [...] > > OK, I see what's going on. > > It looks like it's caused by an "optimisation" in its > libc:exec*(), so /usr/xpg4/bin/sh and POSIX are not to blame > after all. To which Solaris version does this apply? > From what I can gather from my tests, when exec*()'s filename > argument is /bin/sh or any of its other paths (/usr/bin/sh, > /bin/ksh, /usr/bin/ksh93, /bin/./sh...) or /usr/xpg4/bin/sh, > (but not /usr/xpg4/bin/./sh), the first argument (argv[0]) is > anything that doesn't start with "r", including "-sh", the > second is "-c" (not "-cc", not "-uc"...) and the third is some > very simple shell code, that doesn't contain non-ASCII > characters nor shell special characters other than spc and tab, > and no further argument then > exec*() takes the shell's role at parsing the command line, > splits it on spc and tab and tries to execute the corresponding > command by itself. It it can't do it (command not found for > instance), then it falls back to executing the shell normally. Strange. > $ ksh -c 'printf %d 1+1' > printf: 1+1 not completely converted This is the correct expected output for /usr/bin/printf > What? ksh's printf does take arithmetic expressions as arguments > for %d. > > $ ksh -c 'printf %d 1+1;' > 2 > $ ksh -c 'printf %d 1+1' ksh > 2 > > Adding that ; special shell character or an extra argument > disables the optimisation. But this seems to be an easteregg from ksh93. Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
RE: raise(0) (was: Exit status 128)
The C standard does not distinguish zero as a separate, single valued, domain. Some math theories, academically, characterize it that way but those are not the basis of any of the 3 signed integer forms. It has the same sign bit value as other positive values so is positive. Negative zero, for the two forms that support it, has the complementary sign bit value. Also, the domains of unsigned types are considered positive values too and these all include zero. As such, 7.14 para 3 doesn't even preclude one of the required signals from being assigned to zero, it just says, via "which expand to positive integer constant expressions with type int and distinct values", if one uses it the others can't. It is POSIX that disallows zero as a CX modification of the requirements, so the usage of it by a platform would be extension behavior, as I note, but the behavior expected is fully defined. On Monday, February 3, 2020 Geoff Clare wrote: Shware Systems wrote, on 31 Jan 2020: > > Subject: Re: Exit status 128 [was: exit status for false should be 1-125] > > The value 128 is potentially special to platforms implementing > extensions, as it corresponds to the signo 0. While POSIX uses this as > the 'validate pid' function of kill(), the wording in the C standard > for raise() requires a raise(0), especially when the encoding for int > distinguishes positive and negative zero values, to be delivered to a > process for handling by an application's signal handler. C99 only specifies the behaviour of raise() and signal() for SIGABRT, SIGFPE, SIGILL, SIGINT, SIGSEGV, and SIGTERM. The behaviour for all other "sig" argument values is either implementation-defined or undefined. 7.14 para 4 says "The complete set of signals, their semantics, and their default handling is implementation-defined; all signal numbers shall be positive." This means that "the complete set of signals" for which an implementation defines the behaviour cannot include 0, because 0 is not positive (see below). Thus the behaviour for 0 is not implementation-defined and must therefore be undefined. Yes, 0 is not "positive". When used on its own, "positive" means "greater than zero", and likewise "negative" means "less than zero". I.e. +0 is zero (with the sign bit unset), it is not "positive", and -0 is zero (with the sign bit set), it is not "negative". The terms "positive zero" and "negative zero" are misleading and should be avoided in formal use. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: About printf %2$s (Was: Coordination on standardizing gettext() in future POSIX)
Stephane Chazelas wrote: > 2020-01-24 15:14:48 +0100, Joerg Schilling: > > printf "Hello World %2$s %1$s\\n" 1 2 > [...] > > mksha ksh88 clone > [...] > > ksh88 had no printf builtin. OK, this is a mstake that frequently happens because most shells have a builtin printf. > You might have been mislead by Solaris' /usr/xpg4/bin/sh I believe that Oracle Solaris 11 still uses ksh88, as it is harder to make ksh93 POSIX compliant than to work with the ksh88 variant that has already been adopted. > On Solaris 11, > > $ /usr/xpg4/bin/sh -c 'type printf' > printf is a shell builtin This does not apply to OpenSolaris, but on OpenSolaris, this was closed source as ksh88 is not available under OSS license. This also does not apply to Oracle Solaris 11.3, so where did you test? Could you run "whatwhell" with this shell please? > mksh has no printf builtin either I know but I was confused. > AFAIK, the printf utility is a POSIX invention (ksh93 release > notes do mention the POSIX origin), possibly inspired by > research Unix 10th edition which had a printf utility (but not > %b for instance) possibly from as far back as I believe that %b was a POSIX invention. Svr4 had already a printf(1) in 1988, but this was a 10-liner that did just call printf(3) and thus could only handle strings. > 1986 (if we're to beleive the timestamp of printf.c at > https://www.tuhs.org/Archive/Distributions/Research/Dan_Cross_v10/v10src.tar.bz2 > $ tar tvf v10src.tar.bz2 cmd/printf.c > -rw-rw-r-- root/root 3621 1986-07-29 20:40 cmd/printf.c This is more than Svr4 had in 1988. > I don't know where the %2$x format in printf(3) comes from. Well, from my private history memory, I have in mind that Sun introduced it in the 1980s, when the basics for gettext(3) have been created. So this must have been no later than for SunOS-4.0. I believe, there was a related talk on a Sun User Group meeting that time with examples for printing date strings. It was in SVr4 from the beginning and since this was in printf(3) on Svr4, it also applies to early printf(1) in Svr4. Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
raise(0) (was: Exit status 128)
Shware Systems wrote, on 31 Jan 2020: > > Subject: Re: Exit status 128 [was: exit status for false should be 1-125] > > The value 128 is potentially special to platforms implementing > extensions, as it corresponds to the signo 0. While POSIX uses this as > the 'validate pid' function of kill(), the wording in the C standard > for raise() requires a raise(0), especially when the encoding for int > distinguishes positive and negative zero values, to be delivered to a > process for handling by an application's signal handler. C99 only specifies the behaviour of raise() and signal() for SIGABRT, SIGFPE, SIGILL, SIGINT, SIGSEGV, and SIGTERM. The behaviour for all other "sig" argument values is either implementation-defined or undefined. 7.14 para 4 says "The complete set of signals, their semantics, and their default handling is implementation-defined; all signal numbers shall be positive." This means that "the complete set of signals" for which an implementation defines the behaviour cannot include 0, because 0 is not positive (see below). Thus the behaviour for 0 is not implementation-defined and must therefore be undefined. Yes, 0 is not "positive". When used on its own, "positive" means "greater than zero", and likewise "negative" means "less than zero". I.e. +0 is zero (with the sign bit unset), it is not "positive", and -0 is zero (with the sign bit set), it is not "negative". The terms "positive zero" and "negative zero" are misleading and should be avoided in formal use. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England