Re: awk: FS matching 0 or more characters

2020-02-03 Thread Stephane Chazelas
2020-02-03 15:10:29 -0800, Don Cragun:
[...]
>   "The search for a matching sequence starts at the beginning
>   of a string and stops when the first sequence matching the
> * ``begins earliest in the string’’. If the pattern permits
> * a variable number of matching characters and thus there is
> * more than one such sequence starting at that point, the
> * longest such sequence is matched. For example, the BRE
>   "bb*" matches the second to fourth characters of the
>   string "abbbc", and the ERE "(wee|week)(knights|night)"
>   matches all ten characters of the string "weeknights".
> 
> * "Consistent with the whole match being the longest of the
> * leftmost matches, each subpattern, from left to right,
> * shall match the longest possible string. For this purpose,
> * a null string shall be considered to be longer than no
> * match at all. For example, matching the BRE "\(.*\).*"
> * against "abcdef", the subexpression "(\1)" is "abcdef",
> * and matching the BRE "\(a*\)*" against "bc", the
> * subexpression "(\1)" is the null string.
> 
>   "When a multi-character collating element in a bracket
>   expression (see Section 9.3.5, on page 184) is involved,
>   the longest sequence shall be measured in characters
>   consumed from the string to be matched; that is, the
>   collating element counts not as one element, but as the
>   number of characters it matches."
> 
> Noting the part of this definition that is on lines shown above
> with a leading asterisk, I believe the standard is clear and
> that Busybox awk does not conform.
[...]

Not sure how you reached that conclusion. It seems to me on the
contrary that that text alone would mean that busybox awk is the
only compliant implementation.

When it comes to sed or grep, all implementations agree with
busybox awk.

$ echo bbb | gsed 's/a*/<&>/g'
<>b<>b<>b<>
$ echo bbb | busybox sed 's/a*/<&>/g'
<>b<>b<>b<>
$ echo bbb | solaris-sed 's/a*/<&>/g'
<>b<>b<>b<>
$ echo bbb | solaris-xpg4-sed 's/a*/<&>/g'
<>b<>b<>b<>
$ echo aaa | grep 'b*'
aaa

The special behaviour of the original awk, mawk or gawk AFAICT
is a non-documented (AFAICT) deviation and seems to only apply
to FS processing (and split()).

sub(), gsub(), match, /.../ will happily match an empty string.

$ echo bbb | /usr/xpg4/bin/awk '{gsub(/a*/, "<&>"); print}'
<>b<>b<>b<>
$ echo bbb | gawk '{gsub(/a*/, "<&>"); print}'
<>b<>b<>b<>
$ echo bbb | mawk '{gsub(/a*/, "<&>"); print}'
<>b<>b<>b<>
$ echo bbb | gawk '/a*/'
bbb

To account for those implementations, POSIX should say that when
the split regexp matches an empty string, it's undefined whether
that empty string is taken as a field separator or ignored (and
in any case, matching resumes at the next character, not at the
end of the matched text otherwise it would loop indefinitely
(like ast-open's grep -o 'a*' does)).

-- 
Stephane



Re: awk: FS matching 0 or more characters

2020-02-03 Thread Don Cragun
Hi Martijn,
In the description of REs in the standard, "match" is described (on
P181-182, L5969-5993 in the 2017 edition of the standar) as:
"A sequence of zero or more characters shall be said to
be matched by a BRE or ERE when the characters in the
sequence correspond to a sequence of characters defined by
the pattern.

"Matching shall be based on the bit pattern used for encoding
the character, not on the graphic representation of the
character. This means that if a character set contains two
or more encodings for a graphic symbol, or if the strings
searched contain text encoded in more than one codeset, no
attempt is made to search for any other representation of
the encoded symbol. If that is required, the user can
specify equivalence classes containing all variations of
the desired graphic symbol.

"The search for a matching sequence starts at the beginning
of a string and stops when the first sequence matching the
*   ``begins earliest in the string’’. If the pattern permits
*   a variable number of matching characters and thus there is
*   more than one such sequence starting at that point, the
*   longest such sequence is matched. For example, the BRE
"bb*" matches the second to fourth characters of the
string "abbbc", and the ERE "(wee|week)(knights|night)"
matches all ten characters of the string "weeknights".

*   "Consistent with the whole match being the longest of the
*   leftmost matches, each subpattern, from left to right,
*   shall match the longest possible string. For this purpose,
*   a null string shall be considered to be longer than no
*   match at all. For example, matching the BRE "\(.*\).*"
*   against "abcdef", the subexpression "(\1)" is "abcdef",
*   and matching the BRE "\(a*\)*" against "bc", the
*   subexpression "(\1)" is the null string.

"When a multi-character collating element in a bracket
expression (see Section 9.3.5, on page 184) is involved,
the longest sequence shall be measured in characters
consumed from the string to be matched; that is, the
collating element counts not as one element, but as the
number of characters it matches."

Noting the part of this definition that is on lines shown above
with a leading asterisk, I believe the standard is clear and
that Busybox awk does not conform.

Hope this helps,
Don

> On Feb 3, 2020, at 1:50 PM, Martijn Dekker  wrote:
> 
> Consider:
> 
> echo 'one!two!!three!!!end' | awk -v 'FS=!*' \
>   '{ for (i=NF; i>0; i--) print $i; }'
> 
> Onetrueawk, mawk, GNU awk, and Solaris awk all print:
> 
>> end
>> three
>> two
>> one
> 
> However, Busybox awk prints:
> 
>> d
>> n
>> e
>> e
>> e
>> r
>> h
>> t
>> o
>> w
>> t
>> e
>> n
>> o
> 
> In a way, the Busybox awk behaviour makes more sense. The "!*" ERE means: 
> match zero or more "!", and that's exactly what it did.
> 
> Changing the ERE to '!+' makes all awks behave consistently, so that's the 
> obvious fix.
> 
> But what, if anything, does POSIX have to say about an FS ERE matching zero 
> or more characters?
> 
> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html#tag_20_06_13_04
> 
> I can only find:
> 
>> 1. If FS is a null string, the behavior is unspecified.
> 
> That doesn't really apply; FS is a non-null ERE, though one that may match 
> the null string.
> 
>> 3. [...] Each occurrence of a sequence matching the extended regular
>> expression shall delimit fields.
> 
> Is a null string matching the ERE a "sequence" that matches it?
> 
> So at this point I'm not sure whether to report a bug in Busybox awk, or an 
> area in the standard that needs further specification or clarification, or 
> neither...
> 
> - Martijn
> 
> -- 
> modernish -- harness the shell
> https://github.com/modernish/modernish
> 




Re: [1003.1(2008)/Issue 7 0000252]: dot should follow Utility Syntax Guidelines

2020-02-03 Thread Steffen Nurpmeso
Robert Elz wrote in <28486.1580735...@jinx.noi.kre.to>:
 |Date:Fri, 31 Jan 2020 23:18:31 +0100
 |From:Steffen Nurpmeso 
 |Message-ID:  <20200131221831.vapcz%stef...@sdaoden.eu>
 |
 || May i ask whether you have numbers on how often "." is used with
 || $PATH searching?
 |
 |Of course you may ask.
 |
 |But to save you actually doing that, no, I have no idea, like many
 |other things in this area all that we really know is that it might
 |be, as that's how it was specified to work - even if none of us
 |usually ever use it that way.
 |But the relevant change here was identical for both "." and "exec"
 |and "exec" using PATH search is rather more likely.
 |
 || But, then: for explicit relative
 || file names, using ./ is a way to accomplish escaping.
 |
 |Of course, but if that were adequate for the original issue, the
 |reported problem would not have needed a change at all.   That's
 |a different kettle of fish.   I'm assuming (without any personal
 |evidence of it) that there actually is/was a problem to fix, and
 |then simply asking about whether the fix that was made perhaps over
 |does things a little (requires more than was strictly necessary).

I see, after having read the issue, it was from 2010.  I use
functions which do the path search for me, you and Stephane have
fixed bugs in it, as you possibly remember.  This is because
command -v is not really usable.  So i, among others, have found
ways to workaround issues.  I have found only one script which
does './exec "$shellvar"' where $shellvar is not a readily
prepared path, and that is in a release script which is known to
run locally.  But yes, of course, having -- is an improvement.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



Re: raise(0) (was: Exit status 128)

2020-02-03 Thread Shware Systems

No, they won't laugh. If the intent was to exclude a "normal  zero", as 
6.2.6.2p3 refers to it, the text "positive non-zero value" is explicitly 
required. Since it isn't there, which I'm fine with, maybe someone else should 
file a defect report with them.
On Monday, February 3, 2020 Geoff Clare  wrote:
Shware Systems  wrote, on 03 Feb 2020:
>
>> C99 only specifies the behaviour of raise() and signal() for SIGABRT,
>> SIGFPE, SIGILL, SIGINT, SIGSEGV, and SIGTERM.  The behaviour for all
>> other "sig" argument values is either implementation-defined or undefined.
>> 7.14 para 4 says "The complete set of signals, their semantics, and
>> their default handling is implementation-defined; all signal numbers
>> shall be positive."  This means that "the complete set of signals"
>> for which an implementation defines the behaviour cannot include 0,
>> because 0 is not positive (see below).  Thus the behaviour for 0 is
>> not implementation-defined and must therefore be undefined.

>> Yes, 0 is not "positive".  When used on its own, "positive" means
>> "greater than zero", ...

> [Zero] has the same sign bit value as other positive values so is
> positive.

> As such, 7.14 para 3 doesn't even preclude one of the required
> signals from being assigned to zero, it just says, via "which expand
> to positive integer constant expressions with type int and distinct
> values", if one uses it the others can't.

I suggest you try making that claim to the C committee and give them a
good laugh.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



awk: FS matching 0 or more characters

2020-02-03 Thread Martijn Dekker

Consider:

echo 'one!two!!three!!!end' | awk -v 'FS=!*' \
'{ for (i=NF; i>0; i--) print $i; }'

Onetrueawk, mawk, GNU awk, and Solaris awk all print:


end
three
two
one


However, Busybox awk prints:



d
n
e

e
e
r
h
t

o
w
t

e
n
o


In a way, the Busybox awk behaviour makes more sense. The "!*" ERE 
means: match zero or more "!", and that's exactly what it did.


Changing the ERE to '!+' makes all awks behave consistently, so that's 
the obvious fix.


But what, if anything, does POSIX have to say about an FS ERE matching 
zero or more characters?


https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html#tag_20_06_13_04

I can only find:


1. If FS is a null string, the behavior is unspecified.


That doesn't really apply; FS is a non-null ERE, though one that may 
match the null string.



3. [...] Each occurrence of a sequence matching the extended regular
expression shall delimit fields.


Is a null string matching the ERE a "sequence" that matches it?

So at this point I'm not sure whether to report a bug in Busybox awk, or 
an area in the standard that needs further specification or 
clarification, or neither...


- Martijn

--
modernish -- harness the shell
https://github.com/modernish/modernish



Interpretations starting a 30 day review

2020-02-03 Thread Andrew Josey
All
Please note the following interpretations are starting a 30 day review. 
Comments back please no later than March 6 2020.


0001307: Base Definitions and Headers  am_pm value in locales that do not 
distinguish between am and pm (again)
0001309: Shell and Utilities  Clarity needed for initial value of $? at start 
of compound-list compound statements

regards
Andrew



Andrew JoseyThe Open Group
Austin Group Chair  
Email: a.jo...@opengroup.org 
Apex Plaza, Forbury Road,Reading,Berks.RG1 1AX,England

To learn how we maintain your privacy, please review The Open Group Privacy 
Statement at http://www.opengroup.org/privacy.
To unsubscribe/opt-out from this mailing list login to The Open Group 
collaboration portal at
https://collaboration.opengroup.org/operational/portal.php?action=unsub&listid=2481








[1003.1(2016)/Issue7+TC2 0001307]: am_pm value in locales that do not distinguish between am and pm (again)

2020-02-03 Thread Austin Group Bug Tracker


The following issue has been UPDATED. 
== 
https://www.austingroupbugs.net/view.php?id=1307 
== 
Reported By:geoffclare
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1307
Category:   Base Definitions and Headers
Type:   Clarification Requested
Severity:   Comment
Priority:   normal
Status: Interpretation Required
Name:   Geoff Clare 
Organization:   The Open Group 
User Reference:  
Section:7.3.5.1 LC_TIME Locale Definition 
Page Number:160 
Line Number:5085 
Interp Status:  Proposed 
Final Accepted Text:See
https://www.austingroupbugs.net/view.php?id=1307#c4762. 
== 
Date Submitted: 2019-12-18 15:35 UTC
Last Modified:  2020-02-03 21:14 UTC
== 
Summary:am_pm value in locales that do not distinguish
between am and pm (again)
==
Relationships   ID  Summary
--
related to  081 am_pm value in locales that do not dist...
child of466 date +%C problem
== 

-- 
 (0004767) ajosey (manager) - 2020-02-03 21:14
 https://www.austingroupbugs.net/view.php?id=1307#c4767 
-- 
Interpretation Proposed: 3 February 2020 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2019-12-18 15:35 geoffclare New Issue
2019-12-18 15:35 geoffclare Name  => Geoff Clare 
2019-12-18 15:35 geoffclare Organization  => The Open Group  
2019-12-18 15:35 geoffclare Section   => 7.3.5.1 LC_TIME
Locale Definition
2019-12-18 15:35 geoffclare Page Number   => 160 
2019-12-18 15:35 geoffclare Line Number   => 5085
2019-12-18 15:35 geoffclare Interp Status => --- 
2019-12-18 15:36 geoffclare Relationship added   related to 081  
2019-12-18 15:38 geoffclare Note Added: 0004688  
2019-12-18 18:54 shware_systems Note Added: 0004689  
2019-12-18 19:08 shware_systems Note Edited: 0004689 
2020-01-30 16:57 Don Cragun Note Added: 0004762  
2020-01-30 16:58 eblake Relationship added   child of 466
2020-01-30 16:59 Don Cragun Interp Status--- => Pending  
2020-01-30 16:59 Don Cragun Final Accepted Text   => See
https://www.austingroupbugs.net/view.php?id=1307#c4762.
2020-01-30 16:59 Don Cragun Status   New => Interpretation
Required
2020-01-30 16:59 Don Cragun Resolution   Open => Accepted As
Marked
2020-01-30 16:59 Don Cragun Tag Attached: issue8 
2020-02-03 21:14 ajosey Interp StatusPending => Proposed 
2020-02-03 21:14 ajosey Note Added: 0004767  
==




[1003.1(2016)/Issue7+TC2 0001309]: Clarity needed for initial value of $? at start of compound-list compound statements

2020-02-03 Thread Austin Group Bug Tracker


The following issue has been UPDATED. 
== 
https://www.austingroupbugs.net/view.php?id=1309 
== 
Reported By:kre
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1309
Category:   Shell and Utilities
Type:   Enhancement Request
Severity:   Objection
Priority:   normal
Status: Interpretation Required
Name:   Robert Elz 
Organization:
User Reference:  
Section:2.9.4 
Page Number:2371-4 
Line Number:75726-31 
Interp Status:  Proposed 
Final Accepted Text:   
https://www.austingroupbugs.net/view.php?id=1309#c4763 
== 
Date Submitted: 2019-12-19 02:26 UTC
Last Modified:  2020-02-03 21:13 UTC
== 
Summary:Clarity needed for initial value of $? at start of
compound-list compound statements
==
Relationships   ID  Summary
--
related to  0001150 exit status of command substitution not...
related to  051 sh exit status not clear for built-in t...
== 

-- 
 (0004766) ajosey (manager) - 2020-02-03 21:13
 https://www.austingroupbugs.net/view.php?id=1309#c4766 
-- 
Interpretation Proposed: 3 February 2020 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2019-12-19 02:26 kreNew Issue
2019-12-19 02:26 kreName  => Robert Elz  
2019-12-19 02:26 kreSection   => 2.9.4   
2019-12-19 02:26 krePage Number   => 2371-4  
2019-12-19 02:26 kreLine Number   => 75726-31
2020-01-16 17:42 geoffclare Note Added: 0004731  
2020-01-16 17:43 geoffclare Note Edited: 0004731 
2020-01-16 20:35 kreNote Added: 0004732  
2020-01-16 21:36 kreNote Added: 0004733  
2020-01-17 04:17 kreNote Added: 0004734  
2020-01-17 04:19 kreNote Edited: 0004734 
2020-01-17 09:56 joerg  Note Added: 0004735  
2020-01-17 10:31 kreNote Added: 0004736  
2020-01-17 15:39 geoffclare Note Added: 0004737  
2020-01-17 15:53 joerg  Note Added: 0004738  
2020-01-17 15:56 joerg  Note Edited: 0004738 
2020-01-17 16:04 joerg  Note Edited: 0004738 
2020-01-17 16:17 geoffclare Note Edited: 0004737 
2020-01-18 02:38 kreNote Added: 0004739  
2020-01-20 11:57 geoffclare Note Added: 0004741  
2020-01-20 14:54 geoffclare Note Added: 0004742  
2020-01-20 14:55 geoffclare Note Edited: 0004742 
2020-01-20 14:58 geoffclare Relationship added   related to 0001150  
2020-01-20 15:04 geoffclare Relationship added   related to 051  
2020-01-20 18:37 kreNote Added: 0004743  
2020-01-23 14:55 geoffclare Note Added: 0004744  
2020-01-23 14:56 geoffclare Note Edited: 0004744 
2020-01-23 14:59 geoffclare Note Edited: 0004744 
2020-01-23 15:01 geoffclare Note Edited: 0004744 
2020-01-30 17:20 geoffclare Note Added: 0004763  
2020-01-30 17:21 geoffclare Interp Status => Pending 
2020-01-30 17:21 geoffclare Final Accepted Text   =>
https://www.austingroupbugs.net/view.php?id=1309#c4763
2020-01-30 17:21 geoffclare Status   New => Interpretation
Required
2020-01-30 17:21 geoffclare Resolution   Open => Accepted As
Marked
2020-01-30 17:21 geoffclare Tag Attached: issue8 
2

Re: Solaris /usr/xpg4/bin/sh builtin handling (Was: About printf %2$s)

2020-02-03 Thread Stephane CHAZELAS
2020-02-03 14:32:57 +0100, casper@oracle.com:
[...]
> Right.  I think it may need some fine tuning but I think it is fine to 
> avoid the shell when it is not needed.

Yes, at least (beside what's already done):

- that optimisation must be disabled if the first word is a
  builtin, special builtin, builtin alias or function, or
  keyword of the corresponding shell. (beware that for /bin/sh
  (ksh93), the list of builtins depends on $PATH (if
  /opt/ast/bin is in front of $PATH, a few more builtins are
  enabled)

  $ PATH=/opt/ast/bin:$PATH sh -c 'builtin;alias' | wc -l
  82

  (plus the keywords)

- it must be disabled if the code argument starts with - or +

- if the value of the $SHELL environment variable starts with r.
  There may be other env variables (like _AST_FEATURES) that
  affect the way the shell parses and runs simple commands.


> I was not aware that ksh was all that dangerous; especially as it allows 
> crossing privilege boundaries using environment variable.

It's not limited to ksh. In all shells, you mustn't use
unsanitized data in arithmetic expressions. Some shells are
worse than others. In dash for instance, the exposure is
limitted to $(($var)) / $(($1)), and the damage is limited to
assigning variables (var=PATH=7734). $((var)) there is OK
(anything other than octal, hex or decimal constants with
optional -/+ sign and blanks triggers an error).

> Not quite as bad as "Shellshock"; not even close.  Still another reason to 
> avoid the shell when it not actually needed to start a new command.

The vulnerability in this case is not in the shell, but in the
scripts using that feature (if they forget to sanitize data
before using in arithmetic context). The feature could be seen
as a misfeature though as it makes it difficult to write safe
shell code.

That can't be fully fixed though as long as $(($1)) is required
(by POSIX) to evaluate the arithmetic expression stored in the
first positional parameter.

> I'm not sure why we ended up in Solaris with 18 commands which are 
> basically built-in ksh93 commands that make little sense as individual
> executables:
> 
> aliascd   fc   getopts  jobs printtest ulimit   
> unalias
> bg   command  fg   hash kill read type umaskwait
> 
> It seems that is being tested in XPG4.os/procenv/confstr/
> 
> The only ones that makes sense are "kill" & "print".
[...]

Except for "print", that's a POSIX requirement (which many
systems ignore) as non-special builtins have to be available as
standalone commands (at  least for exec*p(), env, find -exec,
and all the commands that can execute commands).

-- 
Stephane



[1003.1(2016)/Issue7+TC2 0001313]: Underline tags in strftime Application Usage

2020-02-03 Thread Austin Group Bug Tracker


The following issue has been RESOLVED. 
== 
https://austingroupbugs.net/view.php?id=1313 
== 
Reported By:dennisw
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1313
Category:   System Interfaces
Type:   Error
Severity:   Editorial
Priority:   normal
Status: Resolved
Name:   Dennis Wölfing 
Organization:
User Reference:  
Section:strftime 
Page Number:2049 
Line Number:65729 
Interp Status:  --- 
Final Accepted Text:See
https://austingroupbugs.net/view.php?id=1313#c4765. 
Resolution: Accepted As Marked
Fixed in Version:   
== 
Date Submitted: 2020-01-02 13:57 UTC
Last Modified:  2020-02-03 16:31 UTC
== 
Summary:Underline tags in strftime Application Usage
== 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2020-01-02 13:57 denniswNew Issue
2020-01-02 13:57 denniswName  => Dennis Wölfing 
2020-01-02 13:57 denniswSection   => strftime
2020-01-02 13:57 denniswPage Number   => 2049
2020-01-02 13:57 denniswLine Number   => 65729   
2020-02-03 16:30 Don Cragun Note Added: 0004765  
2020-02-03 16:31 Don Cragun Interp Status => --- 
2020-02-03 16:31 Don Cragun Final Accepted Text   => See
https://austingroupbugs.net/view.php?id=1313#c4765.
2020-02-03 16:31 Don Cragun Status   New => Resolved 
2020-02-03 16:31 Don Cragun Resolution   Open => Accepted As
Marked
==




[1003.1(2016)/Issue7+TC2 0001313]: Underline tags in strftime Application Usage

2020-02-03 Thread Austin Group Bug Tracker


A NOTE has been added to this issue. 
== 
https://austingroupbugs.net/view.php?id=1313 
== 
Reported By:dennisw
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1313
Category:   System Interfaces
Type:   Error
Severity:   Editorial
Priority:   normal
Status: New
Name:   Dennis Wölfing 
Organization:
User Reference:  
Section:strftime 
Page Number:2049 
Line Number:65729 
Interp Status:  --- 
Final Accepted Text: 
== 
Date Submitted: 2020-01-02 13:57 UTC
Last Modified:  2020-02-03 16:30 UTC
== 
Summary:Underline tags in strftime Application Usage
== 

-- 
 (0004765) Don Cragun (manager) - 2020-02-03 16:30
 https://austingroupbugs.net/view.php?id=1313#c4765 
-- 
On page 2049 line 65729 section strftime, change:
(<+/->Y-MM-DD)
to:
(<+/->Y-MM-DD, i.e. with a 5 or more digit year) 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2020-01-02 13:57 denniswNew Issue
2020-01-02 13:57 denniswName  => Dennis Wölfing 
2020-01-02 13:57 denniswSection   => strftime
2020-01-02 13:57 denniswPage Number   => 2049
2020-01-02 13:57 denniswLine Number   => 65729   
2020-02-03 16:30 Don Cragun Note Added: 0004765  
==




[1003.1(2016)/Issue7+TC2 0001311]: j command incorrectly referred to in ed's rationale section

2020-02-03 Thread Austin Group Bug Tracker


The following issue has been RESOLVED. 
== 
https://austingroupbugs.net/view.php?id=1311 
== 
Reported By:andras_farkas
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1311
Category:   Shell and Utilities
Type:   Error
Severity:   Editorial
Priority:   normal
Status: Resolved
Name:   Andras Farkas 
Organization:
User Reference:  
Section:ed 
Page Number:2689 
Line Number:87741 
Interp Status:  --- 
Final Accepted Text: 
Resolution: Accepted
Fixed in Version:   
== 
Date Submitted: 2019-12-20 10:09 UTC
Last Modified:  2020-02-03 16:09 UTC
== 
Summary:j command incorrectly referred to in ed's rationale
section
== 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2019-12-20 10:09 andras_farkas  New Issue
2019-12-20 10:09 andras_farkas  Name  => Andras Farkas   
2019-12-20 10:09 andras_farkas  Section   => ed  
2019-12-20 10:09 andras_farkas  Page Number   => ed  
2019-12-20 10:09 andras_farkas  Line Number   => 1200
2019-12-20 10:11 andras_farkas  Note Added: 0004694  
2019-12-20 10:27 geoffclare Page Number  ed => 2689  
2019-12-20 10:27 geoffclare Line Number  1200 => 87741   
2019-12-20 10:27 geoffclare Interp Status => --- 
2019-12-20 10:27 geoffclare Description Updated  
2019-12-20 10:27 geoffclare Desired Action Updated   
2019-12-20 10:27 shware_systems Note Added: 0004695  
2019-12-20 10:29 geoffclare Note Added: 0004696  
2019-12-20 10:29 geoffclare Description Updated  
2019-12-20 10:30 geoffclare Note Edited: 0004696 
2019-12-20 10:33 andras_farkas  Note Added: 0004697  
2020-01-12 01:46 andras_farkas  Note Added: 0004720  
2020-02-03 16:09 Don Cragun Status   New => Resolved 
2020-02-03 16:09 Don Cragun Resolution   Open => Accepted
==




[1003.1(2016)/Issue7+TC2 0001312]: ctags -v example in ctags's rationale section missing a newline

2020-02-03 Thread Austin Group Bug Tracker


The following issue has been RESOLVED. 
== 
https://austingroupbugs.net/view.php?id=1312 
== 
Reported By:andras_farkas
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1312
Category:   Shell and Utilities
Type:   Error
Severity:   Editorial
Priority:   normal
Status: Resolved
Name:   Andras Farkas 
Organization:
User Reference:  
Section:ctags 
Page Number:2625 
Line Number:85386 
Interp Status:  --- 
Final Accepted Text: 
Resolution: Accepted
Fixed in Version:   
== 
Date Submitted: 2019-12-20 10:49 UTC
Last Modified:  2020-02-03 16:12 UTC
== 
Summary:ctags -v example in ctags's rationale section
missing a newline
== 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2019-12-20 10:49 andras_farkas  New Issue
2019-12-20 10:49 andras_farkas  Name  => Andras Farkas   
2019-12-20 10:49 andras_farkas  Section   => ctags   
2020-01-12 01:46 andras_farkas  Note Added: 0004721  
2020-02-03 16:12 geoffclare Page Number   => 2625
2020-02-03 16:12 geoffclare Line Number   => 85386   
2020-02-03 16:12 geoffclare Interp Status => --- 
2020-02-03 16:12 geoffclare Status   New => Resolved 
2020-02-03 16:12 geoffclare Resolution   Open => Accepted
==




Re: Solaris /usr/xpg4/bin/sh builtin handling (Was: About printf %2$s)

2020-02-03 Thread Casper . Dik


>"casper@oracle.com"  wrote:

>> The only ones that makes sense are "kill" & "print".
>
>I would say that "print" is not needed since it is not required to be callable
>via exec(), since it is a ksh88/ksh93 private builtin.


Right  "print" is not tested for in the test suite.

Casper



Re: Solaris /usr/xpg4/bin/sh builtin handling (Was: About printf %2$s)

2020-02-03 Thread Joerg Schilling
"casper@oracle.com"  wrote:

> I'm not sure why we ended up in Solaris with 18 commands which are 
> basically built-in ksh93 commands that make little sense as individual
> executables:
>
> aliascd   fc   getopts  jobs printtest ulimit   
> unalias
> bg   command  fg   hash kill read type umaskwait
>
> It seems that is being tested in XPG4.os/procenv/confstr/
>
> The only ones that makes sense are "kill" & "print".

I would say that "print" is not needed since it is not required to be callable 
via exec(), since it is a ksh88/ksh93 private builtin.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: Solaris /usr/xpg4/bin/sh builtin handling (Was: About printf %2$s)

2020-02-03 Thread Casper . Dik


>2020-02-03 12:40:45 +0100, Joerg Schilling:
>[...]
>> > It looks like it's caused by an "optimisation" in its
>> > libc:exec*(), so /usr/xpg4/bin/sh and POSIX are not to blame
>> > after all.
>> 
>> To which Solaris version does this apply?
>
>That was 11.4

Yes.

>> > $ ksh -c 'printf %d 1+1'
>> > printf: 1+1 not completely converted
>> 
>> This is the correct expected output for /usr/bin/printf
>
>Yes, that's the point, /usr/bin/printf was called instead of ksh
>(ksh93 here) and its builtin.
>
>> > What? ksh's printf does take arithmetic expressions as arguments
>> > for %d.
>> >
>> > $ ksh -c 'printf %d 1+1;'
>> > 2
>> > $ ksh -c 'printf %d 1+1' ksh
>> > 2
>> >
>> > Adding that ; special shell character or an extra argument
>> > disables the optimisation.
>> 
>> But this seems to be an easteregg from ksh93.
>[...]
>
>printf %d 1+1 to output 2 is expected in ksh where in most
>places where a number is expected, any arithmetic expression is
>accepted as well. That behaviour was also copied by zsh.
>
>It causes all sorts of security headaches as arithmetic expressions can assign
>variables (like for IFS=1234567890, PATH=7734) or run arbitrary code (like
>a=[$(evil)0])
>
>$ a=2 b='a[$(evil)0]' ksh -c 'printf %d b' # /usr/bin/printf run
>printf: b expected numeric value
>$ a=2 b='a[$(evil)0]' ksh -c 'printf "%d" b' # ksh printf run
>ksh: printf: evil: not found [No such file or directory]
>
>The easteregg here is more solaris libc:exec*() bypassing the
>execution of a shell in some cases.


Right.  I think it may need some sine tuning but I think it is fine to 
avoid the shell when it is not needed.

I was not aware that ksh was all that dangerous; especially as it allows 
crossing privilege boundaries using environment variable.

Not quite as bad as "Shellshock"; not even close.  Still another reason to 
avoid the shell when it not actually needed to start a new command.

I'm not sure why we ended up in Solaris with 18 commands which are 
basically built-in ksh93 commands that make little sense as individual
executables:

aliascd   fc   getopts  jobs printtest ulimit   unalias
bg   command  fg   hash kill read type umaskwait

It seems that is being tested in XPG4.os/procenv/confstr/

The only ones that makes sense are "kill" & "print".

Casper




Re: [1003.1(2008)/Issue 7 0000252]: dot should follow Utility Syntax Guidelines

2020-02-03 Thread Robert Elz
Date:Fri, 31 Jan 2020 23:18:31 +0100
From:Steffen Nurpmeso 
Message-ID:  <20200131221831.vapcz%stef...@sdaoden.eu>

  | May i ask whether you have numbers on how often "." is used with
  | $PATH searching?

Of course you may ask.

But to save you actually doing that, no, I have no idea, like many
other things in this area all that we really know is that it might
be, as that's how it was specified to work - even if none of us
usually ever use it that way.

But the relevant change here was identical for both "." and "exec"
and "exec" using PATH search is rather more likely.

  | But, then: for explicit relative
  | file names, using ./ is a way to accomplish escaping.

Of course, but if that were adequate for the original issue, the
reported problem would not have needed a change at all.   That's
a different kettle of fish.   I'm assuming (without any personal
evidence of it) that there actually is/was a problem to fix, and
then simply asking about whether the fix that was made perhaps over
does things a little (requires more than was strictly necessary).

kre



Re: About printf %2$s (Was: Coordination on standardizing gettext() in future POSIX)

2020-02-03 Thread Joerg Schilling
Stephane CHAZELAS  wrote:

> (what's "whatwhell" by the way? or do you mean Scan Mascheck's
> "whatshell"?)

Correct, sticky fingers ;-)

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: raise(0) (was: Exit status 128)

2020-02-03 Thread Geoff Clare
Shware Systems  wrote, on 03 Feb 2020:
>
>> C99 only specifies the behaviour of raise() and signal() for SIGABRT,
>> SIGFPE, SIGILL, SIGINT, SIGSEGV, and SIGTERM.  The behaviour for all
>> other "sig" argument values is either implementation-defined or undefined.
>> 7.14 para 4 says "The complete set of signals, their semantics, and
>> their default handling is implementation-defined; all signal numbers
>> shall be positive."  This means that "the complete set of signals"
>> for which an implementation defines the behaviour cannot include 0,
>> because 0 is not positive (see below).  Thus the behaviour for 0 is
>> not implementation-defined and must therefore be undefined.

>> Yes, 0 is not "positive".  When used on its own, "positive" means
>> "greater than zero", ...

> [Zero] has the same sign bit value as other positive values so is
> positive.

> As such, 7.14 para 3 doesn't even preclude one of the required
> signals from being assigned to zero, it just says, via "which expand
> to positive integer constant expressions with type int and distinct
> values", if one uses it the others can't.

I suggest you try making that claim to the C committee and give them a
good laugh.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: About printf %2$s

2020-02-03 Thread Geoff Clare
Joerg Schilling  wrote, on 03 Feb 2020:
>
> > I don't know where the %2$x format in printf(3) comes from.
> 
> Well, from my private history memory, I have in mind that Sun introduced it
> in the 1980s, when the basics for gettext(3) have been created. So this must 
> have been no later than for SunOS-4.0. I believe, there was a related talk on 
> a Sun User Group meeting that time with examples for printing date strings.

In XPG2 there was a separate nl_printf() function which handled this
format (and only this format - you had to call printf() to use %x and
nl_printf() to use %2$x).  It was in the same "NLS" (Native Language
Support) section of the spec as nl_init(), nl_langinfo() and catopen().
XPG2 was published in Jan 1987, so the work on specifying nl_printf()
would have been done in the year or so before that. I don't know if it
was based on an existing implementation.

The functionality from nl_printf() was merged into printf() in XPG3.
(And nl_init() was dropped in favour of the new setlocale() function
from the draft C standard.)

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: Solaris /usr/xpg4/bin/sh builtin handling (Was: About printf %2$s)

2020-02-03 Thread Stephane CHAZELAS
2020-02-03 12:40:45 +0100, Joerg Schilling:
[...]
> > It looks like it's caused by an "optimisation" in its
> > libc:exec*(), so /usr/xpg4/bin/sh and POSIX are not to blame
> > after all.
> 
> To which Solaris version does this apply?

That was 11.4

> > $ ksh -c 'printf %d 1+1'
> > printf: 1+1 not completely converted
> 
> This is the correct expected output for /usr/bin/printf

Yes, that's the point, /usr/bin/printf was called instead of ksh
(ksh93 here) and its builtin.

> > What? ksh's printf does take arithmetic expressions as arguments
> > for %d.
> >
> > $ ksh -c 'printf %d 1+1;'
> > 2
> > $ ksh -c 'printf %d 1+1' ksh
> > 2
> >
> > Adding that ; special shell character or an extra argument
> > disables the optimisation.
> 
> But this seems to be an easteregg from ksh93.
[...]

printf %d 1+1 to output 2 is expected in ksh where in most
places where a number is expected, any arithmetic expression is
accepted as well. That behaviour was also copied by zsh.

It causes all sorts of security headaches as arithmetic expressions can assign
variables (like for IFS=1234567890, PATH=7734) or run arbitrary code (like
a=[$(evil)0])

$ a=2 b='a[$(evil)0]' ksh -c 'printf %d b' # /usr/bin/printf run
printf: b expected numeric value
$ a=2 b='a[$(evil)0]' ksh -c 'printf "%d" b' # ksh printf run
ksh: printf: evil: not found [No such file or directory]

The easteregg here is more solaris libc:exec*() bypassing the
execution of a shell in some cases.

-- 
Stephane



Re: About printf %2$s (Was: Coordination on standardizing gettext() in future POSIX)

2020-02-03 Thread Stephane CHAZELAS
2020-02-03 11:43:38 +0100, Joerg Schilling:
[...]
> > $ /usr/xpg4/bin/sh -c 'type printf'
> > printf is a shell builtin
> 
> This does not apply to OpenSolaris, but on OpenSolaris, this was closed 
> source 
> as ksh88 is not available under OSS license.
> 
> This also does not apply to Oracle Solaris 11.3, so where did you test?
> 
> Could you run "whatwhell" with this shell please?
[...]

That was Solaris 11.4 in a VM as freshly downloaded from Oracle.

Yes, it's based on ksh88, but note that as seen in my follow-up
messages on austin-group-l
(https://www.mail-archive.com/austin-group-l@opengroup.org/msg05548.html,
https://www.mail-archive.com/austin-group-l@opengroup.org/msg05549.html,
I had reduced the distribution by then as I don't expect it's of
much interest to GNU gettext), the system actually didn't run
/usr/xpg4/bin/sh at all in that case but ran /usr/bin/type
printf instead, where /usr/bin/type appears to be some special
build of ksh93. Which explains why it said printf was builtin.

If you don't get the same, then possibly that (undocumented
AFAICT) bypassing of sh is a new feature in 11.4 or you have
/usr/xpg4/bin ahead of /bin and/or /usr/bin in your $PATH, in
which case /usr/xpg4/bin/type is called instead.

(what's "whatwhell" by the way? or do you mean Scan Mascheck's
"whatshell"?)

-- 
Stephane



Re: Solaris /usr/xpg4/bin/sh builtin handling (Was: About printf %2$s)

2020-02-03 Thread Joerg Schilling
Stephane Chazelas  wrote:

> 2020-02-01 10:47:46 +, Stephane Chazelas:
> [...]
> > That doesn't explain why it's different with ${0+type} or when
> > there's more than the one invocation of "type" in the script.
> [...]
>
> OK, I see what's going on.
>
> It looks like it's caused by an "optimisation" in its
> libc:exec*(), so /usr/xpg4/bin/sh and POSIX are not to blame
> after all.

To which Solaris version does this apply?

> From what I can gather from my tests, when exec*()'s filename
> argument is /bin/sh or any of its other paths (/usr/bin/sh,
> /bin/ksh, /usr/bin/ksh93, /bin/./sh...) or /usr/xpg4/bin/sh,
> (but not /usr/xpg4/bin/./sh), the first argument (argv[0]) is
> anything that doesn't start with "r", including "-sh", the
> second is "-c" (not "-cc", not "-uc"...) and the third is some
> very simple shell code, that doesn't contain non-ASCII
> characters nor shell special characters other than spc and tab,
> and no further argument then

> exec*() takes the shell's role at parsing the command line,
> splits it on spc and tab and tries to execute the corresponding
> command by itself. It it can't do it (command not found for
> instance), then it falls back to executing the shell normally.

Strange.

> $ ksh -c 'printf %d 1+1'
> printf: 1+1 not completely converted

This is the correct expected output for /usr/bin/printf

> What? ksh's printf does take arithmetic expressions as arguments
> for %d.
>
> $ ksh -c 'printf %d 1+1;'
> 2
> $ ksh -c 'printf %d 1+1' ksh
> 2
>
> Adding that ; special shell character or an extra argument
> disables the optimisation.

But this seems to be an easteregg from ksh93.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



RE: raise(0) (was: Exit status 128)

2020-02-03 Thread Shware Systems

The C standard does not distinguish zero as a separate, single valued, domain. 
Some math theories, academically, characterize it that way but those are not 
the basis of any of the 3 signed integer forms. It has the same sign bit value 
as other positive values so is positive. Negative zero, for the two forms that 
support it, has the complementary sign bit value. Also, the domains of unsigned 
types are considered positive values too and these all include zero.

As such, 7.14 para 3 doesn't even preclude one of the required signals from 
being assigned to zero, it just says, via "which expand to positive integer 
constant expressions with type int and distinct values", if one uses it the 
others can't. It is POSIX that disallows zero as a CX modification of the 
requirements, so the usage of it by a platform would be extension behavior, as 
I note, but the behavior expected is fully defined.
On Monday, February 3, 2020 Geoff Clare  wrote:
Shware Systems  wrote, on 31 Jan 2020:
>
> Subject: Re: Exit status 128 [was: exit status for false should be 1-125]
> 
> The value 128 is potentially special to platforms implementing
> extensions, as it corresponds to the signo 0. While POSIX uses this as
> the 'validate pid' function of kill(), the wording in the C standard
> for raise() requires a raise(0), especially when the encoding for int
> distinguishes positive and negative zero values, to be delivered to a
> process for handling by an application's signal handler.

C99 only specifies the behaviour of raise() and signal() for SIGABRT,
SIGFPE, SIGILL, SIGINT, SIGSEGV, and SIGTERM.  The behaviour for all
other "sig" argument values is either implementation-defined or undefined.
7.14 para 4 says "The complete set of signals, their semantics, and
their default handling is implementation-defined; all signal numbers
shall be positive."  This means that "the complete set of signals"
for which an implementation defines the behaviour cannot include 0,
because 0 is not positive (see below).  Thus the behaviour for 0 is
not implementation-defined and must therefore be undefined.

Yes, 0 is not "positive".  When used on its own, "positive" means
"greater than zero", and likewise "negative" means "less than zero".
I.e. +0 is zero (with the sign bit unset), it is not "positive",
and -0 is zero (with the sign bit set), it is not "negative".  The
terms "positive zero" and "negative zero" are misleading and should
be avoided in formal use.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: About printf %2$s (Was: Coordination on standardizing gettext() in future POSIX)

2020-02-03 Thread Joerg Schilling
Stephane Chazelas  wrote:

> 2020-01-24 15:14:48 +0100, Joerg Schilling:

> > printf "Hello World %2$s %1$s\\n"  1 2
> [...]
> > mksha ksh88 clone
> [...]
>
> ksh88 had no printf builtin.

OK, this is a mstake that frequently happens because most shells have a builtin 
printf.

> You might have been mislead by Solaris' /usr/xpg4/bin/sh

I believe that Oracle Solaris 11 still uses ksh88, as it is harder to make 
ksh93 POSIX compliant than to work with the ksh88 variant that has already been 
adopted.

> On Solaris 11,
>
> $ /usr/xpg4/bin/sh -c 'type printf'
> printf is a shell builtin

This does not apply to OpenSolaris, but on OpenSolaris, this was closed source 
as ksh88 is not available under OSS license.

This also does not apply to Oracle Solaris 11.3, so where did you test?

Could you run "whatwhell" with this shell please?

> mksh has no printf builtin either

I know but I was confused.

> AFAIK, the printf utility is a POSIX invention (ksh93 release
> notes do mention the POSIX origin), possibly inspired by
> research Unix 10th edition which had a printf utility (but not
> %b for instance) possibly from as far back as

I believe that %b was a POSIX invention.

Svr4 had already a printf(1) in 1988, but this was a 10-liner that did just 
call printf(3) and thus could only handle strings.

> 1986 (if we're to beleive the timestamp of printf.c at
> https://www.tuhs.org/Archive/Distributions/Research/Dan_Cross_v10/v10src.tar.bz2
> $ tar tvf v10src.tar.bz2 cmd/printf.c
> -rw-rw-r-- root/root  3621 1986-07-29 20:40 cmd/printf.c

This is more than Svr4 had in 1988.

> I don't know where the %2$x format in printf(3) comes from.

Well, from my private history memory, I have in mind that Sun introduced it
in the 1980s, when the basics for gettext(3) have been created. So this must 
have been no later than for SunOS-4.0. I believe, there was a related talk on 
a Sun User Group meeting that time with examples for printing date strings.

It was in SVr4 from the beginning and since this was in printf(3) on Svr4, it 
also applies to early printf(1) in Svr4.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



raise(0) (was: Exit status 128)

2020-02-03 Thread Geoff Clare
Shware Systems  wrote, on 31 Jan 2020:
>
> Subject: Re: Exit status 128 [was: exit status for false should be 1-125]
> 
> The value 128 is potentially special to platforms implementing
> extensions, as it corresponds to the signo 0. While POSIX uses this as
> the 'validate pid' function of kill(), the wording in the C standard
> for raise() requires a raise(0), especially when the encoding for int
> distinguishes positive and negative zero values, to be delivered to a
> process for handling by an application's signal handler.

C99 only specifies the behaviour of raise() and signal() for SIGABRT,
SIGFPE, SIGILL, SIGINT, SIGSEGV, and SIGTERM.  The behaviour for all
other "sig" argument values is either implementation-defined or undefined.
7.14 para 4 says "The complete set of signals, their semantics, and
their default handling is implementation-defined; all signal numbers
shall be positive."  This means that "the complete set of signals"
for which an implementation defines the behaviour cannot include 0,
because 0 is not positive (see below).  Thus the behaviour for 0 is
not implementation-defined and must therefore be undefined.

Yes, 0 is not "positive".  When used on its own, "positive" means
"greater than zero", and likewise "negative" means "less than zero".
I.e. +0 is zero (with the sign bit unset), it is not "positive",
and -0 is zero (with the sign bit set), it is not "negative".  The
terms "positive zero" and "negative zero" are misleading and should
be avoided in formal use.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England