Re: [PATCH] Make mktokens accept a random TMPDIR, replace `...` with $(...).

2018-11-15 Thread Eric Blake

On 11/15/18 9:32 AM, Devin Hussey wrote:

From b9724fc82eda2b0d164c33ad3e871d38b298d1ad Mon Sep 17 00:00:00 2001

From: Devin Hussey 
Date: Thu, 15 Nov 2018 10:30:05 -0500
Subject: [PATCH] Make mktokens accept a random TMPDIR, replace `...` with
  $(...).

Sorry about the multiple commits at once.

Signed-off-by: Devin Hussey 
---
  src/mktokens | 17 ++---
  1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/src/mktokens b/src/mktokens
index cd52241..ec801cc 100644
--- a/src/mktokens
+++ b/src/mktokens
@@ -37,7 +37,10 @@
  # token marks the end of a list.  The third column is the name to print in
  # error messages.

-cat > /tmp/ka$$ <<\!
+# set TMPDIR if it isn't already
+[ -z "${TMPDIR}" ] && TMPDIR="/tmp"


Shorter as:

: "${TMPDIR:=/tmp}"

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


Re: Unexpected behaviour: double backslash in single quotes

2018-09-06 Thread Eric Blake

On 09/06/2018 04:40 AM, Joshua Phillips wrote:

Escape sequences don't work in single quotes:

$ echo 'hello\world'
hello\world
$ echo 'hello\'


Warning. Use of 'echo' and backslashes is non-portable.  There are two 
classical behaviors:
1. backslashes are not special to echo unless you pass -e, so you also 
have to have -n to elide a trailing newline (this is the behavior of 
bash by default)
2. backslashes ARE special by default, so you don't need -e; and \c 
exists to elide a trailing newline, so you don't need -n (this is the 
behavior of dash by default, and the behavior required by POSIX; bash 
can also be configured to run in this mode via 'set -o posix; shopt -s 
xpg_echo')



Which makes it surprising that double backslashes get converted to single 
backslashes:

$ echo 'hello\\world'
hello\world

Is this intended behaviour?


Yes.  dash is obeying the POSIX-mandated behavior, and interpreting \ 
sequences by default. Since \w is not a known sequence, dash cheats and 
outputs \ as-is instead of giving you an error (although an error would 
be friendlier at reminding you that \ is active-by-default in dash). 
But since \\ is a known sequence, it gets interpreted by echo.



Bash behaves as I would have expected.


Rather, bash in its default mode does what you are used to, but violated 
POSIX.  Bash in the mode that I mentioned above (set -o posix; shopt -s 
xpg_echo) behaves like dash.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


Re: test bug?

2018-02-19 Thread Eric Blake

On 02/19/2018 06:53 AM, Yuriy Vostrikov wrote:

Hello,

Is this expected behavior?

$ cd /tmp
$ mkdir foo
$ cd foo/
$ touch a
$ /usr/bin/test a -nt b; echo $?
0
$ /bin/bash -c 'test a -nt b; echo $?'
0
$ /bin/dash -c 'test a -nt b; echo $?'
1


Yes.  -nt is not specified by POSIX, and the behavior of -nt when one of 
the two operands does not exist can make sense under multiple 
interpretations (treat a missing file as a silent error, where both 'a 
-nt b' and 'b -nt a' fail with status 1 [dash]; treat a missing file as 
always newer, because once you make it exist it will have a newer 
timestamp [not sure if anyone does that]; treat a missing file as a hard 
error with message to stderr and status 2 [not sure if anyone does 
that]; treat a missing file as always older, perhaps because you use the 
default timestamp of Jan 1 1970 when interpreting all 0's for any file 
that fails to stat [bash, coreutils]).  The same problem of multiple 
interpretations also applies to -ot.


At any rate, I don't see it as a bug in dash, so much as your script 
making non-portable assumptions about non-standardized behavior.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: echo "\\1"?

2017-07-27 Thread Eric Blake
On 07/27/2017 10:10 AM, Bosco wrote:
> That script of zziplib isn't mine, I only had to compile it once
> because it was necessary for compile other program (TeX Live).
> 
> I'm not talking about POSIX, and I don't mind what it said. I'm
> talking about the man page of dash, that said:
> 

> when \\ is reached is replaced by \.

When \\ is reached AS THE ARGUMENT to echo.

> Then, in the command
>   echo 
> because \\ is reached first, then it will be replaced by '\'

No, you are demonstrating a gap in your understanding of shell quoting
rules.

echo 
echo ""
echo '\\'
echo '\'"\\"

are all the same way to pass the two-character argument to echo.  That
two-character argument is a valid escape sequence, which in turn means
echo outputs a single \ character then a newline.

> character, immediately after that another \\ is reached, then it will
> be replaced by another '\' character. It turns out the ouput '\\'.

If you want two \ as output, you have to pass four characters (not two)
to echo, so your input has to be one of these (or other) valid quotings:

echo ''
echo ""
echo \\'\'"\\"\\'\'"\\"\\'\'
etc.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: echo "\\1"?

2017-07-27 Thread Eric Blake
On 07/27/2017 08:13 AM, Bosco wrote:
> On 27 July 2017 at 12:54, Eric Blake <ebl...@redhat.com> wrote:
>> Which man pages?  Echo is one of those programs that varies widely, and
>> you are MUCH better off using printf(1) instead of echo(1) if you are
>> trying to get newline suppression, trying to print something that might
>> begin with -, or trying to print something that might contain \.
> 
> Sorry, maybe I did't explain it correctly, I mean the man pages of the
> dash source:
> https://git.kernel.org/pub/scm/utils/dash/dash.git/tree/src/dash.1#n1202
> 
> And because of this, I got an error compiling zziplib, you may see
> https://github.com/gdraheim/zziplib/blob/v0.13.67/configure#L17542

Eww - storing generated files in git - that forces everyone that checks
out your project to use the EXACT same version of autotools to avoid
changing the generated files unintentionally.

Looking at those lines:

>   if test -f $ac_prefix_conf_INP ; then
> echo "s/^#undef  *\\([ABCDEFGHIJKLMNOPQRSTUVWXYZ_]\\)/#undef 
> $ac_prefix_conf_UPP""_\\1/" > conftest.prefix

ac_prefix_conf_INP is not defined anywhere in autoconf 2.69 sources (and
you really shouldn't use the ac_ prefix if you are writing code that is
not part of autoconf proper).  I couldn't find mention of it at
https://github.com/gdraheim/zziplib/blob/v0.13.67/configure.ac, but it
may be in one of your other included files.  Can you pinpoint which part
of your configure.ac results in that part of the generated configure
file?  In all likelihood, you are using a buggy macro that is using
autoconf primitives incorrectly, and thus resulting in non-portable
code.  But without seeing the true source, I can't help you debug your
problem.

>> Arguably, since it is not required by POSIX, we don't have to do it. But
>> I also can't argue that POSIX forbids us to support \1 as an extension
>> (it says nothing about whether implementations can have additional
>> escape sequences).  So I'll argue that it is intentional as a dash
>> extension.  But if you can make dash smaller by getting rid of the
>> extension, that might be an acceptable patch.
> 
> In that case, I think, the man page of dash should be modified with
> that extension.

Indeed, or the fact that it is NOT documented means that it is an
unintentional bug for providing the extension.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: echo "\\1"?

2017-07-27 Thread Eric Blake
On 07/27/2017 07:23 AM, Bosco wrote:
> According the man pages,

Which man pages?  Echo is one of those programs that varies widely, and
you are MUCH better off using printf(1) instead of echo(1) if you are
trying to get newline suppression, trying to print something that might
begin with -, or trying to print something that might contain \.

> for echo command, "\\" should print '\'
> character, and \0digits should print the byte in octal base.
> But the command
> 
> echo "\\1"

This is the same as
echo '\1'

which is NOT defined by POSIX as being a valid escape sequence that echo
must recognize.

(Did you mean to test
echo '\\1'
instead?)

Here's the POSIX list of required escape sequences:

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/echo.html

> 
> outputs the byte 0x01 in hexadecimal (or 001 in octal).
> Is this a bad behavior or is intentional?

Arguably, since it is not required by POSIX, we don't have to do it. But
I also can't argue that POSIX forbids us to support \1 as an extension
(it says nothing about whether implementations can have additional
escape sequences).  So I'll argue that it is intentional as a dash
extension.  But if you can make dash smaller by getting rid of the
extension, that might be an acceptable patch.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: Parameter expansion, patterns and fnmatch

2016-09-02 Thread Eric Blake
On 09/02/2016 09:46 AM, Herbert Xu wrote:
> On Fri, Sep 02, 2016 at 09:25:15AM -0500, Eric Blake wrote:
>>
>> 2.13.1 Patterns Matching a Single Character
>>
>> [
>> If an open bracket introduces a bracket expression as in XBD RE
>> Bracket Expression, except that the  character ( '!' )
>> shall replace the  character ( '^' ) in its role in a
>> non-matching list in the regular expression notation, it shall introduce
>> a pattern bracket expression. A bracket expression starting with an
>> unquoted  character produces unspecified results. Otherwise,
>> '[' shall match the character itself.
> 
> BTW, this last sentence is not present in
> 
> http://pubs.opengroup.org/onlinepubs/009604499/utilities/xcu_chap02.html#tag_02_13
> 

That's the 2004 edition (TC1 of the 2001 spec, aka Issue 6).

> So I presume it's a newer unreleased revision.

Newer but released (TC2 of the 2008 spec, aka Issue 7):
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13_01

The requirement has been there for 8 years now.

> 
> Seriously, you guys are turning POSIX into a joke by introducing
> all these new requirements.  At this point I think we should
> pretty much give up on POSIX compliance the way it's headed.

I hope you're just stating that out of frustration, and not something
that you actually intend to follow through with.  And if there is a
requirement being considered in the Austin Group that you disagree with,
please speak up on the Austin Group - membership is free.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: Parameter expansion, patterns and fnmatch

2016-09-02 Thread Eric Blake
On 09/02/2016 09:29 AM, Herbert Xu wrote:
> On Fri, Sep 02, 2016 at 09:25:15AM -0500, Eric Blake wrote:
>>
>>>> This also affects
>>>>
>>>> case [a in [?) echo ok ;; *) echo bad ;; esac
>>>>
>>>> which should print ok.
>>>
>>> Even ksh prints bad here.
>>
>> So ksh is also buggy.
> 
> Good luck writing a script with an unquoted [ expecting it to be
> portable :)

[ '' ] || echo empty

There, I just wrote a portable script with unquoted [ portably
interpreted as itself and not as a bracket filename expansion pattern.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: Parameter expansion, patterns and fnmatch

2016-09-02 Thread Eric Blake
On 09/02/2016 09:04 AM, Herbert Xu wrote:

>> Yes, this looks like a bug in dash. With the default --disable-fnmatch 
>> code, when dash encounters [ in a pattern, it immediately treats the 
>> following characters as part of the set. If it then encounters the end 
>> of the pattern without having seen a matching ], it attempts to reset 
>> the state and continue as if [ was treated as a literal character right 
>> from the start.

> 
> POSIX says:
> 
> 9.3.3 BRE Special Characters
> 
> A BRE special character has special properties in certain contexts.
...
> An
> expression containing a '[' that is not preceded by a backslash
> and is not part of a bracket expression produces undefined results.


Ah, but POSIX also says:

2.13.1 Patterns Matching a Single Character

[
If an open bracket introduces a bracket expression as in XBD RE
Bracket Expression, except that the  character ( '!' )
shall replace the  character ( '^' ) in its role in a
non-matching list in the regular expression notation, it shall introduce
a pattern bracket expression. A bracket expression starting with an
unquoted  character produces unspecified results. Otherwise,
'[' shall match the character itself.

So while a lone '[' is unspecified in a normal BRE, it is well-defined
in a shell filename pattern matching context.  Since '[' is not a
bracket expression, it MUST be treated as a literal '[', so ${foo#[}
MUST strip the leading [ from the contents of foo, without requiring
that the [ be quoted.

> 
>> This also affects
>>
>> case [a in [?) echo ok ;; *) echo bad ;; esac
>>
>> which should print ok.
> 
> Even ksh prints bad here.

So ksh is also buggy.

> 
> I would however consider a patch that simplifies the code in the
> undefined case.

Except that it is well-defined by POSIX, not undefined.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: heredoc and subshell

2016-02-23 Thread Eric Blake
[adding the Austin Group]

On 02/23/2016 03:07 PM, Oleg Bulatov wrote:
> Hello,
> 
> trying to minimize a shell code I found an unobvious moment with heredocs and 
> subshells.

Thanks for a cool testcase.

> 
> Is it specified by POSIX how next code should be parsed? dash output for this 
> code differs from bash and zsh.

XCU 2.3 says:

When an io_here token has been recognized by the grammar (see Shell
Grammar), one or more of the subsequent lines immediately following the
next NEWLINE token form the body of one or more here-documents and shall
be parsed according to the rules of Here-Document.

and 2.7.4 says:

The here-document shall be treated as a single word that begins after
the next  and continues until there is a line containing only
the delimiter and a , with no  characters in between.
Then the next here-document starts, if there is one.

but with no mention of what happens if you somehow manage to make the
next  be part of an incomplete shell word on the line
containing the here-doc operator.

> 
> --- code
> prefix() { sed -e "s/^/$1:/"; }
> DASH_CODE() { :; }
> 
> prefix A < echo line 1
> XXX
> echo line 2)" && prefix DASH_CODE < echo line 3
> XXX
> echo line 4)"
> echo line 5
> DASH_CODE
> 
> --- bash 4.3.42 output:
> A:echo line 3
> B:echo line 1
> line 2
> DASH_CODE:echo line 4)"
> DASH_CODE:echo line 5

So, it looks like bash is interpreting this as "first newline that is
not in the middle of another shell word), and parses the entire $(...)
construct through line 2 as if there were no newlines, then treats the
newline after DASH_CODE as starting the heredoc, for outputting A: while
visiting line 3 as the lone line in that heredoc.  Then it moves on to
the second command in the && sequence, by processing the command
substitution (a heredoc outputting line 1, then the output of line 2;
then moves on to the third component of the && sequence as a final
heredoc delimited by DASH_CODE, with both lines 4 and 5 output with the
DASH_CODE: prefix.

> 
> --- dash 0.5.8 output:
> A:echo line 1
> B:echo line 2)" && prefix DASH_CODE < B:echo line 3
> line 4
> line 5
> 

Meanwhile, dash is taking the literal first newline as the start of the
first heredoc, and outputting A: with line 1; then consuming the next
heredoc as lines 2 and 3 before finding the end of the command
substitution on line 4, then outputting line 5 on its own and doing
nothing else for the DASH_CODE function call.

ksh 93u+ 2012-08-01 behaves even differently:

B:echo line 1
line 2 && prefix DASH_CODE < after a here-doc operator occurs in
the middle of a shell word.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [BUG] Illegal function names are accepted after being used as aliases

2016-02-23 Thread Eric Blake
On 02/23/2016 02:00 PM, Harald van Dijk wrote:
> 
> I was under the impression that the intent from the dash side was to
> handle all commands the same, and that impression was based on the fact
> that the . command has received additional code to handle -- even though
> there's no requirement for that. However, looking into the original bug
> report that prompted that change in more detail I see that the standard
> will very likely require support for -- in the . command in the future,
> so that doesn't hold up.

Here's the link for dot and exec supporting --:
http://austingroupbugs.net/view.php?id=252

> 
> If that intent isn't there (I'm not saying it's not; I'm unsure now),
> the list of utilities that should be extended is far smaller, if I'm not
> overlooking anything:
> - alias
> - getopts
> - type
> - exec?
> - local?

Weird that unalias already works.  Oh, because of 'unalias -a'.  I
didn't spot any others that you missed (doesn't mean there aren't any,
just that I didn't spot them).

> 
> exec is like .: there's currently no requirement to support --, but that
> requirement is likely to come in the future.

See the above link; exec must support -- if '.' does.  I also found
http://austingroupbugs.net/view.php?id=163 which confirms that 'eval' is
not required (nor it is prevented) from recognizing --.  There's also
http://austingroupbugs.net/view.php?id=960 which mentioned the exit
status of export and several other special builtins, but added no
requirements related to --.

> 
> local is currently non-standard and it's hard to guess whether it will
> require support for -- if standardised.

If standardized, I expect it to require support for --, on the grounds
that 'local -r' already has meaning in bash, so local is definitely a
candidate for taking options.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [BUG] Illegal function names are accepted after being used as aliases

2016-02-23 Thread Eric Blake
On 02/23/2016 12:21 PM, Harald van Dijk wrote:
> On 23/02/2016 19:58, Eric Blake wrote:
>> On 02/23/2016 11:44 AM, Harald van Dijk wrote:
>>
>>> This matches bash's behaviour, aside from bash requiring -- to prevent
>>> detection of invalid flags to the alias command:
>>>
>>> bash-4.3$ alias -- -=true
>>
>> Then dash DOES have a bug:
> 
> Indeed, I wasn't trying to suggest otherwise, my apologies if it came
> across that way. It's not limited to the alias command though, I spotted
> at least the exit and getopts commands having the same problem, and it
> should probably be fixed for all of them at once.

getopts - definitely needs a fix
exit - fuzzy.  exit is a special built-in (unlike getopts); and XCU 2.14
states:

 "Some of the special built-ins are described as conforming to XBD
Utility Syntax Guidelines. For those that are not, the requirement in
Utility Description Defaults that "--" be recognized as a first argument
to be discarded does not apply and a conforming application shall not
use that argument. "

Conforming apps cannot expect 'exit -1' to work, and therefore, cannot
also expect 'exit -- -1' to work, since the only standards-defined
values for an argument to exit is a non-negative decimal integer less
than 256.  Of course, if you want to fix it along with all the others,
that's fine; I'm just pointing out that 'exit' isn't broken as-is.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [BUG] Illegal function names are accepted after being used as aliases

2016-02-23 Thread Eric Blake
On 02/23/2016 11:44 AM, Harald van Dijk wrote:

> This matches bash's behaviour, aside from bash requiring -- to prevent
> detection of invalid flags to the alias command:
> 
> bash-4.3$ alias -- -=true

Then dash DOES have a bug:

# dash
$ alias -- -='echo hi'
alias: -- not found
$ echo $?
1
$ -
hi
$

POSIX XCU 1.4 is clear:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap01.html

"Default Behavior: When this section is listed as "None.", it means that
the implementation need not support any options. Standard utilities that
do not accept options, but that do accept operands, shall recognize "--"
as a first argument to be discarded."

and alias takes operands, stating "OPTIONS: None.", which means POSIX
_requires_ 'alias -- -=name' to (attempt to) define only the single
alias '-', and NOT to also attempt to define '--' as an alias.

It's okay if dash allows 'alias -=blah' to define '-' as an alias as an
extension, but it MUST ignore '--' the way bash does.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [BUG] Illegal function names are accepted after being used as aliases

2016-02-23 Thread Eric Blake
On 02/23/2016 11:18 AM, Jan Verbeek wrote:
> Function definitions that use a bad function name (such as "-" and "=")
> are accepted if the function name already exists as an alias. For example:

Not necessarily a bug.

> 
> $ -
> dash: 1: -: not found
> $ - () { echo hello; }
> dash: 2: Syntax error: Bad function name
> $ -
> dash: 2: -: not found
> $ alias -=true
> $ -

This is equivalent to running 'true'.

> $ - () { echo hello; }

This is equivalent to running 'true () { echo hello; }' - the alias
expansion happens BEFORE the function definition is even parsed.  You
are NOT defining a function named '-', but one named 'true'.

> $ -

This is again equivalent to running 'true' - except that now the
function name 'true' exists and bypasses the shell builtin.

> hello
> $

So the only thing remaining is to determine if it is legal to have a
function override the name of a regular shell builtin.  But
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_01
under "Command Search and Execution" states that function names have
priority over regular built-ins (so yes, creating a function named
'true' is doable, although stupid).


-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: dash drops exported bash functions

2016-02-10 Thread Eric Blake
On 02/10/2016 08:18 AM, Joachim Breitner wrote:
> Dear dash developers,
> 
> a change in 0.5.8, very likely this one
> http://git.kernel.org/cgit/utils/dash/dash.git/commit/?id=46d3c1a614f11f0d40a7e73376359618ff07abcd
> broke the exporting of bash shell functions via the environment.

Not a bug. POSIX says that on shell startup, the behavior of any
inherited environment variables that do not start with a proper shell
name is undefined; and allows shells to scrub such items out of the
environment on startup.  Just because bash does not scrub them (but
instead treats them as shell function imports) does not mean dash has to
behave the same.

That said, preserving any unusable environment variables unchanged,
rather than scrubbing them, may be slightly nicer behavior, but I'm not
sure it's worth the bloat to dash to do so.

> 
> Exporting bash functions via the environment might be a rarely used
> feature, but it is used in practice, unfortunately (otherwise I
> wouldn’t have noticed this).

Exporting bash functions is only usable if you plan on directly invoking
bash.  Don't drag dash into the mess.  Inserting a dash child in between
a bash parent and grandchild means all bets are off for whether the
grandparent can export anything to the grandchild.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] Set LC_ALL instead LC_COLLATE in mkbuiltins

2015-05-22 Thread Eric Blake
On 05/21/2015 10:45 PM, Herbert Xu wrote:
 Setting LC_ALL has the nice property that LC_COLLATE and LC_CTYPE are
 guaranteed to be compatible; if you just set LC_COLLATE but leave
 LC_CTYPE unchanged and unset LC_ALL, it is possible to attempt a
 collation that assumes one character set while still living in a ctype
 that assumes another, and get garbled results.
 
 Show me an actual pair of values for these two that produce
 incorrect results for mkbuiltins and I'll happily change both.

'sort -b' uses isspace() to determine which characters to strip.  There
are locales with a larger set of characters where isspace() returns true
than for the LC_CTYPE=C locale.  Suppose that I can find a single-byte
locale where isblank('\xff') is true.  If that is the case, then the
input '\xffa\nb\n' will sort differently for 'LC_ALL=C sort -b' (output
'b\n'\xffa\n') than for 'LANG=C LC_CTYPE=$locale' (output '\xffa\nb\n')
because the change in CTYPE changes whether the \xff is ignored as a
blank or included as part of the name being sorted.

However, the man pages for 'locale(1)' and 'localedef(1)' did not make
it obvious for me how to perform a search that would easily find such a
locale, so I'm open to suggestions on how to prove my point via more
than just analysis.

And there's still the point that mkbuiltins is being run on controlled
input, where you are sticking only to a subset of characters that happen
to be portable (that is, you are unlikely to be tripped up by a locale
where \xff is a blank, since you are not using \xff in your input).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] Set LC_ALL instead LC_COLLATE in mkbuiltins

2015-05-21 Thread Eric Blake
On 05/21/2015 10:25 PM, Herbert Xu wrote:
 Fredrik Fornwall fred...@fornwall.net wrote:
 In mkbuiltins LC_COLLATE is set, but since The value of the LC_ALL
 environment variable has precedence over any of the other environment
 variables starting with LC_
 (http://pubs.opengroup.org/onlinepubs/7908799/xbd/envvar.html), this
 has no effect when LC_ALL is set.

 This breaks when having e.g. LC_ALL=en_US.UTF-8 during make, which
 causes the test case
dash -c :
 to fail, probably due to broken ordering in builtins.c. The patch
 corrects that by setting LC_ALL instead of LC_COLLATE.
 
 This causes any errors printed by sort to come out in English.

Why do you care whether any errors printed by sort are in the C locale
(in English) rather than localized?  Ideally, there won't be any sort
errors in the first place, because this tool is run on controlled input
as part of the build process.

 
 Please fix this by simply setting LC_ALL to empty alongside
 LC_COLLATE=C.

Setting LC_ALL has the nice property that LC_COLLATE and LC_CTYPE are
guaranteed to be compatible; if you just set LC_COLLATE but leave
LC_CTYPE unchanged and unset LC_ALL, it is possible to attempt a
collation that assumes one character set while still living in a ctype
that assumes another, and get garbled results.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] Fix variable assignments in function invocations

2015-01-09 Thread Eric Blake
On 01/09/2015 10:17 AM, Harald van Dijk wrote:
 Hello all,
 
 A long-standing problem with dash has been how it deals with variable
 assignments in function invocations, and several packages are affected
 by it, two I've come across recently being autogen and pkg-config (only
 their test suites, luckily).
 
 A short test script:
 
   f() {
 echo inside f, VAR is $VAR
 sh -c 'echo inside sh called from f, VAR is $VAR'
   }
 
   VAR=value f

This behavior is tricky.  Here's the latest POSIX wording:
http://austingroupbugs.net/view.php?id=654#c1559

  * If the command name is a function that is not a standard
utility implemented as a function, variable assignments shall
affect the current execution environment during the execution
of the function. It is unspecified:

  - Whether or not the variable assignments persist
after the completion of the function

  - Whether or not the variables gain the export
attribute during the execution of the function

  - Whether or not export attributes gained as a result of the
variable assignments persist after the completion of
the function (if variable assignments persist after the
completion of the function)

So the existing dash behavior is compliant, even if different from bash.

 
 Quoting SUSv4 Shell Command Language 2.9.1 Simple Commands:
 
   If no command name results, variable assignments shall affect the
   current execution environment. Otherwise, the variable assignments
   shall be exported for the execution environment of the command and
   shall not affect the current execution environment (except for
   special built-ins).

This is the text that was rendered obsolete by the above POSIX bug 654.

 Fixing this seems trivial, see the attachment, and the test suites of
 both autogen and pkg-config pass with this change. Does this look correct?

I have no opinion on whether to take the patch in order to behave more
like bash, or whether to tell script-writers to fix their script to
avoid unspecified behavior because dash is already compliant in
providing a different behavior than bash.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: if [ s1 s2 ] broken, writing a s2 file

2014-12-08 Thread Eric Blake
On 12/08/2014 10:32 AM, solsTiCe d'Hiver wrote:
 hello,
 folowing that bug
 https://bugs.launchpad.net/ubuntu/+source/update-notifier/+bug/1400357,
 I follow through to investigate and I found out
 that whatever I try, when comparing 2 strings I always end up with a
 file written to disk
 
From the man page
 test expression
  [ expression ]
 [...]
 s1  s2   True if string s1 comes after s2 based on the ASCII
 value of their characters.

You HAVE to escape the  so that it is interpreted as an argument and
not a redirection operator.  The bug is not in dash, but in your usage.

 
 when I try to use it:
 a=ert
 b=aze
 if [ $a  $b ] ; then

Wrong.  Use:

if [ $a  $b ]; then


 
 so this if syntax is broken or I don't knwo how to use it.

The latter.

 
 Also it is really dangerous  to use a syntax similar to file
 redirection and this is exactly what is happening here.

POSIX is proposing the addition of the shell builtin [[ ]], where
because it is a syntactical part of the shell, it would have safe
semantics (that is, [[ $a  $b ]] would be perfectly safe and do the
right thing). But until the POSIX standardization is complete, dash does
not implement [[; and as long as only '[' is portable (with its
unfortunate but historically-mandated semantics of operating as if it
were NOT a builtin, in that shell parsing happens before test sees its
arguments), then you have to quote anything that might otherwise be
misinterpreted during parsing.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: POSIX compliant trap signal names

2014-10-30 Thread Eric Blake
On 10/30/2014 09:23 AM, Sylvain Bertrand wrote:
 Hit the issue while compiling linux 3.16.3 with dash,
 ${linux-src}/scripts/link-vmlinux.sh line 114 .
 The signal names for trap built-in must be prefixed with SIG to be
 POSIX compliant. dash expect trap signal names without a SIG prefix.

Wrong.  Per
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#trap

The condition can be EXIT, 0 (equivalent to EXIT), or a signal specified
using a symbolic name, without the SIG prefix, as listed in the tables
of signal names in the signal.h header defined in XBD Headers; for
example, HUP, INT, QUIT, TERM. Implementations may permit names with the
SIG prefix or ignore case in signal names as an extension.

Thus, POSIX requires 'trap ... INT' to work, but says 'trap ... SIGINT'
and 'trap ... int' are up to the implementation whether they are
supported as an extension.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[PATCH] [BUILTIN] cd: support drive letters on Cygwin

2014-10-13 Thread Eric Blake
The Cygwin platform supports DOS style drive-letter paths such
as C:\\dir, even though the preferred form is a POSIX-style
/cygdrive/c/dir.  This can be seen by doing things such as
chdir(c:) (which succeeds) followed by getcwd(NULL, 0) (which
returns the normalized /cygdrive/c).  However, dash was trying
to perform local manipulations on the argument to 'cd' prior to
calling into libc, in order to update the state of $PWD and
friends; these manipulations were assuming that the user meant
to change to a relative subdirectory of the current location,
as in './c:', instead of honoring the drive letter.  None of
the other dash builtins take a filename and manipulate it to
affect shell state (some, like 'test', take a file name, but as
stat(c:) works just fine, there is no need to normalize).

This patch has no impact outside of cygwin; on cygwin, it takes
advantage of a native function call to canonicalize any
incoming name into preferred form before updating shell state.

Pre-patch:
$ dash -c 'cd c:  echo $PWD'
dash: 1: cd: can't cd to c:

Post-patch:
$ dash -c 'cd c:  echo $PWD'
/cygdrive/c

Signed-off-by: Eric Blake ebl...@redhat.com
---
 ChangeLog |  4 
 src/cd.c  | 14 ++
 2 files changed, 18 insertions(+)

diff --git a/ChangeLog b/ChangeLog
index a466a7f..a745fe7 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,7 @@
+2014-10-13  Eric Blake  ebl...@redhat.com
+
+   * cd: support drive letters on Cygwin.
+
 2014-09-26  Herbert Xu herb...@gondor.apana.org.au

* Small optimisation of command -pv change.
diff --git a/src/cd.c b/src/cd.c
index 2d9d4b5..a4e024d 100644
--- a/src/cd.c
+++ b/src/cd.c
@@ -38,6 +38,9 @@
 #include string.h
 #include unistd.h
 #include limits.h
+#ifdef __CYGWIN__
+#include sys/cygwin.h
+#endif

 /*
  * The cd and pwd commands.
@@ -194,6 +197,17 @@ updatepwd(const char *dir)
char *cdcomppath;
const char *lim;

+#ifdef __CYGWIN__
+   /* On cygwin, thanks to drive letters, some absolute paths do
+  not begin with slash; but cygwin includes a function that
+  forces normalization to the posix form */
+   char pathbuf[PATH_MAX];
+   if (cygwin_conv_path(CCP_WIN_A_TO_POSIX | CCP_RELATIVE, dir, pathbuf,
+sizeof(pathbuf))  0)
+   sh_error(can't normalize %s, dir);
+   dir = pathbuf;
+#endif
+
cdcomppath = sstrdup(dir);
STARTSTACKSTR(new);
if (*dir != '/') {
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'set' leaks garbage from environment

2014-09-30 Thread Eric Blake
On 09/30/2014 09:14 AM, Olof Johansson wrote:
 On 2014-09-30 09:01 -0600, Eric Blake wrote:
 $ dash -c 'unset a|b
 $ echo $?
 0
 
 Works for me (tested on both Debian package versions 0.5.7-3 (wheezy)
 and 0.5.7-4 (unstable)):

Serves me right from testing on multiple machines :(
I mixed up my test results.

Fedora 20 using dash 0.5.7 works:

$ dash -c 'unset a|b'
dash: 1: unset: a|b: bad variable name
$ rpm -q dash
dash-0.5.7-8.fc20.x86_64

But RHEL 6 fails:

$ dash -c 'unset a|b'
$ rpm -q dash
dash-0.5.5.1-4.el6.x86_64

so this is at least one bug that has already been fixed upstream.

 $ env 'a|b=' dash -c 'set | grep a.b'
 a|b=''
 
 This I can reproduce though.

Meanwhile, I just tested the latest dash.git (commit f21016a12) and this
behavior is no longer present:

$ env 'a|b=' ./src/dash -c 'set | grep a.b'

so it has also been fixed in the meantime.  Sorry for not doing my
homework; nothing to fix here...

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] Support DOS paths in dash

2014-09-30 Thread Eric Blake
[I noticed an old thread in my inbox while packaging dash 0.5.8 for Cygwin]

On 03/28/2013 09:08 AM, Edward Lam wrote [to the cygwin list]:

 The problem is that dash tries to convert c:/windows to an absolute
 path, since it doesn't start with /.  I suppose I could teach dash to
 recognize [letter]:/ as absolute paths, although that makes dash larger,
 and puts a burden on me (since I can guarantee upstream dash won't
 accept such a patch).

 I just don't care enough for DOS paths so I won't fix.

 Me neither.  And since you can use /cygdrive/c, not c:/, I won't bother
 to fix it.


 Hi Folks,

 I finally got down to looking at how to fix this in dash and came up
 with the attached patch (against dash-0.5.7). It's simple enough and so
 cd now works.

 Please consider this for Cygwin.


I'm not interested in burdening the cygwin build of dash with a one-off
patch, so I'd like to gauge the upstream thoughts - is it worth
including platform-specific patches like this (no penalty to build size
of non-cygwin platforms, and on cygwin, it allows 'cd c:/' to behave as
shorthand for 'cd /cygdrive/c/')?  If the patch lands in dash.git, then
I'll rebuild the cygwin port of dash to include a backport (rather than
waiting for 0.5.9 to be released).  If there is no interest, I'd rather
just drop the patch.  The cygwin community already states that
/cygdrive/c notation is the official way to access drive letters, and
that if 'c:/' works it is nice, but it is not a design goal to always
have it work.

 --- src/cd.c  2011-03-15 03:18:06.0 -0400
 +++ src/cd.new.c  2013-03-28 11:03:32.649576500 -0400
 @@ -38,6 +38,9 @@
  #include string.h
  #include unistd.h
  #include limits.h
 +#ifdef __CYGWIN__
 +#include sys/cygwin.h
 +#endif
  
  /*
   * The cd and pwd commands.
 @@ -194,6 +197,11 @@
   char *cdcomppath;
   const char *lim;
  
 +#ifdef __CYGWIN__
 +char pathbuf[PATH_MAX + 1];
 +cygwin_conv_to_full_posix_path (dir, pathbuf);

By the way, cygwin_conv_to_full_posix_path() is deprecated (it suffers
from possible buffer overflow); these days, it's preferred to use:

cygwin_conv_path (CCP_WIN_A_TO_POSIX | CCP_RELATIVE, string, pathbuf,
  sizeof(pathbuf))

So, if there is interest in this patch upstream, I can respin it.

 +  dir = pathbuf;
 +#endif
   cdcomppath = sstrdup(dir);
   STARTSTACKSTR(new);
   if (*dir != '/') {
 
 


-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: Line continuation and variables

2014-08-26 Thread Eric Blake
On 08/26/2014 06:15 AM, Oleg Bulatov wrote:
 Hi!
 
 While playing with sh generators I found that dash and bash have different
 interpretations for slashnewline sequence.
 
 $ dash -c 'EDIT=xxx; echo $EDIT\
 OR'
 xxxOR

Buggy.

 $ bash -c 'EDIT=xxx; echo $EDIT\
 OR'
 /usr/bin/vim

Correct behavior.

 
 $ dash -c 'echo $\
 (pwd)'
 $(pwd)
 
 Is it undefined behaviour in POSIX?

No, it's well-defined, and dash is buggy.  POSIX says:

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_03

the shell shall break its input into tokens by applying the first
applicable rule below to the next character in its input

Rule 4 covers backslash handling, while rule 5 covers locating the end
of a word to be subject to $ expansion.  Therefore, rule 4 should happen
first.  Rule 4 defers to the section on quoting, with the caveat that
newline joining is the only substitution that happens immediately as
part of the parsing:

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02

If a newline follows the backslash, the shell shall interpret this
as line continuation. The backslash and newline shall be removed
before splitting the input into tokens. Since the escaped newline is
removed entirely from the input and is not replaced by any white space,
it cannot serve as a token separator.

So the fact that dash is treating the elided backslash-newline as a
token separator, and parsing your input as if ${EDIT}OR instead of
${EDITOR} is a bug in dash.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [PATCH dash] [BUILTIN] ensure LC_COLLATE is not overriden

2014-08-05 Thread Eric Blake
On 08/05/2014 10:40 AM, Chema Gonzalez wrote:
 If the user environment has either LC_ALL or LANG defined, the setting
 of LC_COLLATE in src/mkbuiltins is overriden. With a non-POSIX locale,
 the orders of dotcmd (remember that '.' is 0x2e in ascii) and truecmd
 (':' is 0x3a in ascii) are reversed, which makes the : command fail
 in the bsearch.
 

 - }}' $temp | LC_COLLATE=C sort -k 1,1 | tee $temp2 | awk '{
 + }}' $temp | LC_ALL= LANG= LC_COLLATE=C sort -k 1,1 | tee $temp2 | awk '{

Setting LC_ALL= to the empty string risks implementation-defined
behavior.  Also, LC_ALL overrides LANG and LC_COLLATE.  It should be
sufficient to merely do:

}}' $temp | LC_ALL=C sort -k 1,1 | tee $temp2 | awk '{

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [PATCH dash] [BUILTIN] ensure LC_COLLATE is not overriden

2014-08-05 Thread Eric Blake
On 08/05/2014 11:12 AM, Chema Gonzalez wrote:
 Setting LC_ALL= to the empty string risks implementation-defined
 behavior.  Also, LC_ALL overrides LANG and LC_COLLATE.  It should be
 sufficient to merely do:

 }}' $temp | LC_ALL=C sort -k 1,1 | tee $temp2 | awk '{
 
 Maybe:
 }}' $temp | LC_ALL=C LANG=C sort -k 1,1 | tee $temp2 | awk '{

No need to specify LANG=C when LC_ALL is set.  I stand by my shorter line.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] \e in echo and printf builtins

2014-06-30 Thread Eric Blake
On 06/28/2014 11:33 AM, Paul Gilmartin wrote:
 
 OTOH, there's a POSIX requirement that builtins be indistinguishable
 (except in performance) from the corresponding executables. 


The POSIX requirement only applies to portable uses of the builtin - ie.
those that are prescribed by POSIX.  Since POSIX does not require \e,
dash is not failing compliance, even if it differs from extensions
provided by corresponding executables.  I do not think dash needs to
bloat for \e unless POSIX standardizes it first.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: sed script fails to run in dash

2013-11-22 Thread Eric Blake
On 11/22/2013 11:11 AM, Tormen wrote:

 sed -e 1$'{w/dev/stdout\n;d}' -i /tmp/x
 
 in a dash script will yield the error message:
 
 sed: -e expression #1, char 2: unknown command: `$'
 
 But why ? :(

Because $'' is not (yet) in POSIX.  It will be required in a future
release, but dash hasn't implemented it yet.

http://austingroupbugs.net/view.php?id=249

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: test incorrectly rejecting valid expression with confusing ! placement

2013-09-03 Thread Eric Blake
On 09/03/2013 07:56 PM, Herbert Xu wrote:
 Harald van Dijk har...@gigawatt.nl wrote:
 Hi,

 Now that Herbert fixed the reported crash in test (in a far simpler
 manner than I had suggested, which I like), I did some more testing, and
 came across one case that does not currently work, and did not work in
 the past, but is perfectly valid:

 $ src/dash -c 'test ! ! = !'
 src/dash: 1: test: =: unexpected operator
 
 Agreed.
  
 $ src/dash -c 'test ! -o !'
 src/dash: 1: test: -o: unexpected operator
 
 Nope, the rule is quite clear that it only applies to binary
 primaries, not operators.  -o is an operator.

Huh?
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/test.html
states that there are only two operators ! and (), and specifically
mentions that -a and -o are binary primaries:

expression1 -a  expression2
[OB XSI] [Option Start]
True if both expression1 and expression2 are true; otherwise, false.
The -a binary primary is left associative. It has a higher precedence
than -o. [Option End]
expression1 -o  expression2
[OB XSI] [Option Start]
True if either expression1 or expression2 is true; otherwise, false.
The -o binary primary is left associative. [Option End]

test ! -o ! is a three-argument test, where $2 (-o) is a binary
primary, so it is the binary test of $1 and $3, and the end result is an
exit status of 0.  Bash and ksh get it right, dash fails.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: Crash on valid input

2013-04-08 Thread Eric Blake
On 04/08/2013 09:12 PM, Dan Kegel wrote:
 Yes, my script was crap, I've fixed it.
 
 Here's the reproducer.  Called with foo unset.  I think it doesn't
 crash without -x.
 
 #!/bin/dash
 set -x
 test ! $foo

The 'set -x' was indeed the key to reproducing the problem.  In fact,
this is the shortest I could make it:

dash -cx 'test !'

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] [BUILTIN] Allow SIG* signal names.

2012-07-02 Thread Eric Blake
On 07/02/2012 12:53 PM, Isaac Jurado wrote:
 On Mon, Jul 2, 2012 at 4:22 PM, Eric Blake ebl...@redhat.com wrote:
 On 07/02/2012 08:11 AM, Paul Gilmartin wrote:
 On Jul 2, 2012, at 07:51, Eric Blake wrote:

 ... non-required bloat ...

 The key phrase.  And one value of a shell lacking such
 extensions is that it provides an excellent test bed for
 code intended to be portable within the POSIX spec.

 That argues that we should drop our strcasecmp() for the much simpler
 strcmp(), in order to remove the bloat we already have.
 
 I guess my patch has no chance to be accepted.

I'm not the maintainer, so my decision is not indicative of what the
dash maintainer will choose.  But my personal preference would be that
we change this area of code, either to:

1. be lighter-weight (drop strcasecmp, which is locale-dependent, and
replace it with strcmp)

2. be more user-friendly (accept optional case-insensitive SIG prefix)

Both approaches are permitted by POSIX, so it boils down to a judgment
call of whether providing useful extensions or providing a minimally
compliant shell is more important.

  But I'm still curious
 about what kind of bloat you are referring to.  I'm assuming it's not
 code bloat in terms of lines of code.

Even one byte larger in the final executable size has been deemed bloat
on this list in the past.  Dash prides itself on being minimalistic, but
you happened to point out an area of code that is not currently minimal.

 
 If the signal name to number conversion seems too expensive (linear
 search multiplied by the string lengths, wether it is case sensitive or
 not), there is a much more elegant solution: perfect hashing.

Indeed, that would provide faster lookup, but it would also cost more
executable size (the storage requirements for a perfect hash table are
larger than the size of a loop comparison); I don't know whether the
preference is for speed, for minimal size, or for a hybrid of the two
(where larger size is okay only if it proves to have more speed).  So
hopefully the dash maintainer will chime in on the topic.

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org





signature.asc
Description: OpenPGP digital signature


Re: [PATCH] var.c: check for valid variable name before printing in export -p

2012-02-25 Thread Eric Blake
On 02/25/2012 07:31 AM, Herbert Xu wrote:
 On Sat, Feb 25, 2012 at 03:30:04PM +0100, Jilles Tjoelker wrote:

 Most shells pass the environment variable through, such as bash, zsh,
 ksh93 and most ash derivatives. However, the original Bourne shell and
 pdksh/mksh do not.
 
 Do you know of any genuine uses of such environment variables?

POSIX states that applications must not rely on such pass-through:
http://austingroupbugs.net/view.php?id=168

So while it might indeed be useful to pass through invalid names, such
an application is broken for expecting it to work, and I'm okay with
this patch as-is.

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: evaluation of env variables in DASH

2011-10-19 Thread Eric Blake

On 10/19/2011 03:24 PM, Dima Sorkin wrote:

Hi.
   The following DASH behaviour seems buggy to me


The only bug here is your expectations.



-
$ export A='\n'
$ echo $A



Passing a literal backslash to echo is non-portable.  POSIX even says 
so.  And bash can match dash behavior:


$ (shopt -s xpg_echo; A='\n'; echo -$A-)
-
-

eblake@office (0) ~/libvirt
$ (shopt -u xpg_echo; A='\n'; echo -$A-)
-\n-

Fix your shell script to use printf instead of echo if the thing you are 
printing might contain a backslash.


--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] [SHELL] Allow building without LINEO support.

2011-08-17 Thread Eric Blake

On 08/17/2011 12:04 AM, Harald van Dijk wrote:

On Tue, 2011-08-16 at 20:12 -0500, Jonathan Nieder wrote:

David Miller wrote:


[Subject: [SHELL] Allow building without LINEO support.]


Thanks!  Debian has been using something like this (but unconditional)
to convince autoconf not to use dash as CONFIG_SHELL, to work around
bugs in various configure scripts[1].  I imagine other users might
want the same thing, so a patch like this seems like a good idea.


If you don't mind me asking, if you want configure scripts to run from
bash, why not simply run configure scripts from bash, instead of running
them from sh and trusting that sh will call bash if it is really some
other shell?


And remember, most configure scripts already support that:

CONFIG_SHELL=path/to/bash path/to/bash ./configure

--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


positional argument bug

2011-05-05 Thread Eric Blake
[originally brought up on the bash list as a NetBSD bug, but dash is
also affected]

On 05/05/2011 08:11 AM, Eric Blake wrote:
 I'd call that a pretty serious incompatibility on the part of ash and its
 descendants (BSD sh, dash, etc.).  There's no good reason that

 set -- a b c d e f g h i j
 echo $10

 should echo `j'.
 
 Also a POSIX violation:
 
 http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_02
 
 The parameter name or symbol can be enclosed in braces, which are
 optional except for positional parameters with more than one digit or
 when parameter is followed by a character that could be interpreted as
 part of the name.

Additionally from POSIX:

If the parameter name or symbol is not enclosed in braces, the
expansion shall use the longest valid name (see XBD Name)

In the shell command language, a word consisting solely of underscores,
digits, and alphabetics from the portable character set. The first
character of a name is not a digit.

Therefore, in $10, 10 is not a name, so the longest name is the empty
string, and the single-character symbol is used instead, such that this
MUST be parsed as ${1}0, not as ${10}.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: `local' built-in POSIX?

2011-03-26 Thread Eric Blake
On 03/26/2011 04:50 PM, Michael Witten wrote:
 I can't find POSIX documentation for the `local' built-in, which is
 available in both dash and bash for the creation of function-local
 variables.
 
 Is it not standard POSIX? If it is not, should it be removed from dash?

No, it is not standard POSIX (yet).  There has been talk on the Austin
Group mailing list of standardizing local (perhaps by the name typeset)
for the next revision; the biggest issue is that ksh uses typeset only
for statically scoped variables, while bash uses it only for dynamically
scoped variables, so a consensus has to be reached among shell writers
which scoping rules to standardize.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: Dash's web presence

2011-03-08 Thread Eric Blake
On 03/08/2011 01:08 AM, Dan Muresan wrote:
 Oh, you do have a GIT repository. Kudos for that.

And when you consider that bash lacks even a public repository, and your
only recourse is massive inter-version diffs, dash is already worlds
ahead in that regards.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: setvar MIA?

2011-01-11 Thread Eric Blake
On 01/11/2011 08:54 AM, Aragon Gouveia wrote:
 Hi,
 
 I'm working on making a number of shell scripts cross compatible between
 FreeBSD and Linux, but one thorn in my side has been dash's lack of a
 setvar builtin.  Does anyone know if this is a work in progress, or a
 decidedly void feature in dash?

Decidedly missing.  POSIX doesn't require it.  Neither bash nor ksh
provides setvar as a builtin, either.  And what does setvar do anyways?
 Perhaps it is some alias or shell function that you have inherited from
startup files in one of your other shells, but I've never heard of a
'setvar' program.  So why bloat dash to include it?

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: setvar MIA?

2011-01-11 Thread Eric Blake
On 01/11/2011 09:54 AM, Aragon Gouveia wrote:
 I wasn't sure of its status in POSIX.  It is useful for declaring
 variable variables - tidier than eval and I imagine faster, eg.
 
 index=1
 setvar var_${index} value
 
 Will emulate it with a local function - thanks.

Indeed, it looks like FreeBSD introduced it as shorthand for:

setvar() { eval $1=\$2; }

The speed difference between that function doing an eval and a shell
builtin would be in the noise.  I don't know why FreeBSD even bothered
to pollute the namespace with a builtin like that.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: static vs. dynamic scoping

2010-11-15 Thread Eric Blake
On 11/15/2010 02:11 PM, Cedric Blancher wrote:
 
 Why is the debate static-vs-dynamic scoping coming up again?

Because before 'typeset' can be standardized in POSIX, we have to get
consensus from all the shell implementers that they will agree to
implement static scoping.

For ksh, the question is moot - ksh93 already does static only.

For dash, the question is valid - the current dash implementation is
dynamic only, but given that switching to static only could probably be
made more efficient, and dash values efficiency, it's a reasonable goal.

For bash and zsh, which currently are dynamic only, the problem stems
that there are now a number of shell script libraries for these two
shells that have exploited dynamic scoping, and which would break if we
aren't careful to standardize something that can still allow dynamic
scoping as an extension.  In other words, this was a probe of the
various shell implementers to figure out how easily static scoping can
be added on after the fact to a dynamic scoping implementation, so that
the shell could conform to a future POSIX revision that mandates static
and permits dynamic as an extension.

 With this background I doubt any proposal for dynamic scoping will
 make it into the next POSIX standard.

There's no desire for dynamic scoping in POSIX; David Korn has already
made that point clear on the Austin Group mailing list.  Rather, there
is a desire for minimal effort for complying with a new POSIX
requirement of static scoping on shells that currently lack it, as well
as backwards compatibility for shells that wish to continue to provide
dynamic scoping as an extension to the standard.

My take of the Austin Group list discussion is that the next revision of
the standard is most likely to have consensus if it just mandates
'typeset' for static scoping, and leaves 'local' as an implementation
extension for dynamic scoping.  Please, chime in on the Austin Group
conversation if you have something useful to add.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: static vs. dynamic scoping

2010-11-10 Thread Eric Blake
[redirecting back to the list, so others can benefit]

On 11/10/2010 02:16 AM, Marc Herbert wrote:
 Le 09/11/2010 21:52, Eric Blake a écrit :
 I'm trying to standardize the notion of local variables for the next
 revision of POSIX, but before I can do so, I need some feedback on two
 general aspects:
 [...]
 Here's a sample shell script that illustrates the difference between the
 two scoping methods.
 
 Hi Eric,
 
   I found your sample script quite confusing. To make your point, does
 this script really need to:
 - use unquoted language keywords as string values?

No; I could have used other strings.

 - use deprecated typeset instead of declare?

Yes - the current Austin Group thoughts are to standardize 'typeset' and
NOT 'local', since 'typeset' can be used with arguments outside of
functions, and more existing shells provide 'typeset' than 'local' (dash
being the odd one out) or 'declare'.  Shells can continue to provide
'local' as a synonym for the most basic use of typeset.

 - use the not (or less?) standard function keyword?

Yes - ksh93 ONLY supports function-local scoping when using the function
keyword, rather than when using POSIX functions (although David Korn
agreed that if POSIX standardizes function-local scoping, he'd make the
next build of ksh support it in POSIX functions).

So, here's the example again, with those points addressed:

# Demonstrate ksh local scoping is static - requires ksh's 'function'

$ ksh -c 'function f1 { typeset a=temp; f2; echo in f1: $a; };
function f2 { echo in f2: $a; a=changed; }; a=global; f1; echo top
level: $a'
in f2: global
in f1: temp
top level: changed

# Demonstrate that with POSIX functions, ksh has global scoping

$ ksh -c 'f1 () { typeset a=temp; f2; echo in f1: $a; }; f2 () { echo
in f2: $a; a=changed; }; a=global; f1; echo top level: $a'in f2: temp
in f1: changed
top level: changed

# Demonstrate that dash local scoping is currently dynamic

$ dash -c 'f1 () { local a=temp; f2; echo in f1: $a; }; f2 () { echo
in f2: $a; a=changed; }; a=global; f1; echo top level: $a'
in f2: temp
in f1: changed
top level: global

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


static vs. dynamic scoping

2010-11-09 Thread Eric Blake
On the Austin Group mailing list, David Korn (of ksh93 fame)
complained[1] that bash's 'local' uses dynamic scoping, but that ksh's
'typeset' uses static scoping, and argued that static scoping is saner
since it matches the behavior of declarative languages like C and Java
(dynamic scoping mainly matters in functional languages like lisp):

[1]
https://www.opengroup.org/sophocles/show_mail.tpl?CALLER=show_archive.tplsource=Llistname=austin-group-lid=14951

I'm trying to standardize the notion of local variables for the next
revision of POSIX, but before I can do so, I need some feedback on two
general aspects:

1. Implementation aspect:
  How hard would it be to add static scoping to dash?
  Is it something that should be added in addition to dynamic scoping,
via the use of an option to select the non-default mode (for example,
'local -d' to force dynamic, 'local -s' to force static, and 'local' to
go with default scoping)?
  Or should dash switch entirely to static scoping (my gut feel is that
static scoping may be more efficient to implement, which fits in line
with dash's desire to be as lean as possible)?

2. User aspect:
  Is anyone aware of a script that intentionally uses the full power of
dynamic scoping available through 'local' which would break if scoping
switched to static?

Here's a sample shell script that illustrates the difference between the
two scoping methods (note that ksh only provides nested scoping via its
typeset builtin, and only when using the function reserved word).

$ ksh -c 'function f1 { typeset a=local; f2; echo $a; };
  function f2 { echo $a; a=changed; };
  a=global; f1; echo $a'
global
local
changed

$ dash -c 'f1 () { typeset a=local; f2; echo $a; };
  f2 () { echo $a; a=changed; };
  a=global; f1; echo $a'
local
changed
global

In static scoping, function f2 does not shadow a declaration of a, so
references to $a within f2 refer to the global variable.  The local
variable a of f1 can only be accessed within f1; the behavior of f2 is
the same no matter how it was reached.

In dynamic scoping, function f2 looks up its call stack for the closest
enclosing scope of a variable named a, and finds the local one declared
in f1.  Therefore, the behavior of f2 depends on how f2 is called.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org




signature.asc
Description: OpenPGP digital signature


Re: [PATCH] [INPUT] Catch attempts to run a directory as a script

2010-10-06 Thread Eric Blake

On 10/06/2010 04:55 AM, Jonathan Nieder wrote:

But POSIX makes it clear enough that in sh command_file,
command_file is supposed to be a file, not a directory.  So
diagnose this with an error message and exit with status 2.

[...]

Is this required by POSIX? If not this is simply making dash
bigger for no good reason.


Not clear.  I suppose POSIX usually doesn't require anything when the
caller screws up.


POSIX requires that input files to bash shall be text files; directories 
do not qualify for this.

http://www.opengroup.org/onlinepubs/9699919799/utilities/sh.html
The input file shall be a text file, except that line lengths shall be 
unlimited. 


However, that is a requirement on the user, not the shell; so running 
'sh /' is a constraint violation by the user, and leaves behavior up to 
the shell.



Under OPERANDS[2]: if the path contains a slash, all the standard says
is the implementation attempts to read that file.  If the path does
not contain a slash and the file is not in the working directory, the
implementation _may_ perform a search as described in Command Search
and Execution.


It's more than just MAY; it's a requirement:
http://www.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_01_01

If the command name contains at least one slash, the shell shall 
execute the utility in a separate utility environment with actions 
equivalent to calling the execve() function...


If the execve() function fails due to an error equivalent to the 
[ENOEXEC] error, the shell shall execute a command equivalent to having 
a shell invoked with the command name as its first operand




During that search, after execve() fails, if the executable file is
not a text file, the shell _may_ bypass this command execution. In
this case, it shall write an error message, and shall return an exit
status of 126. (emphasis mine).


But yes, that same section is clear that for both command searches along 
PATH for a word without slash, and for a direct command with a slash, if 
execve() fails with ENOEXEC (as it does for directories), then it is 
optional whether the shell bypasses attempts to read the file because it 
was not a text file.


On the other hand, in Linux, execve(.,...) fails with EACCES, as 
permitted by the standard:


http://www.opengroup.org/onlinepubs/9699919799/functions/execve.html
[EACCES] ...or the new process image file is not a regular file and the 
implementation does not support execution of files of its type.


And since EACCES is not the same class as ENOEXEC, there is no 
requirement for the shell to attempt to execute the same file.  So, 
rather than stat()ing the argument in advance and checking for S_ISDIR, 
it seems like it would be simpler to check after the execve() attempt 
for EACCES and blindly set $? to 126 in that case (since you already 
have to check for ENOEXEC).



So this behavior is allowed as an optional subset of an optional
behavior.  That may have guided the bash implementors:

  $ bash directory
  directory: directory: is a directory
  $ echo $?
  126

It's probably not required.


Additionally, the standard REQUIRES that 'sh -c exec /' shall fail 
with status 126:


http://www.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#exec
If command is found, but it is not an executable utility, the exit 
status shall be 126.


Right now, dash gets this wrong:

dash -c 'exec .'; echo $?
exec: 1: /: Permission denied
2

And since you already have the code in dash to detect failure to 'exec' 
a directory, you should be able to reuse that code when detecting 
failure to run a directory as a script, as in 'dash .'.


[Hmm, bash also gets it wrong:
bash -c 'exec .'; echo $?
bash: line 0: exec: .: not found
127
even though . should always be found]

--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [EVAL] with set -e exit the shell if a subshell exits non-zero

2010-06-28 Thread Eric Blake
On 06/28/2010 01:22 AM, Cristian Ionescu-Idbohrn wrote:
 Has bash's behaviour changed recently (I'm using an ancient
 version)?

Yes - bash 4.1 tries harder to be compliant with the recent Austin Group
interpretations (and more like ksh).

 
 bash 3.2.39 and 4.0.37 are behaving as dash without the suggested
 patch.  Still.  What is the correct behaviour?  That should be the
 essential matter IMO, not what others do.  So, why should this fail:
 
   $ dash -c 'set -e; false; echo here'
 
 and this succeed?
 
   $ dash -c 'set -e; (false); echo here'

According to the Austin Group:
http://austingroupbugs.net/view.php?id=52

the desired behavior is:

Replace the description of -e with:

  -e When this option is on, when any command fails (for any of the
  reasons listed in [xref to 2.8.1] or by returning an exit status
  greater than zero) the shell immediately shall exit with the
  following exceptions:

  1) The failure of any individual command in a multi-command
  pipeline shall not cause the shell to exit. Only the
  failure of the pipeline itself shall be considered.

  2) The -e setting shall be ignored when executing the compound
  list following the while, until, if, or elif reserved word,
  a pipeline beginning with the ! reserved word, or any
  command of an AND-OR list other than the last.

  3) If the exit status of a compound command other than a
  subshell command was the result of a failure while -e was
  being ignored, then -e shall not apply to this command.

  This requirement applies to the shell environment and each
  subshell environment separately. For example, in

  set -e; (false; echo one) | cat; echo two

  the false command causes the subshell to exit without executing
  echo one; however, echo two is executed because the exit status
  of the pipeline (false; echo one) | cat is zero.

Per these rules, both 'set -e; false; echo here' and 'set -e; (false);
echo here' are silent in bash 4.1.  The fact that dash is not silent
when a subshell exits with non-zero status is at odds with the above
Austin Group ruling.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: test -x should use faccessat, not stat

2010-04-02 Thread Eric Blake
According to Herbert Xu on 4/2/2010 8:03 AM:
 After much deliberation (alright, I've simply been busy elsewhere :)
 I've committed this patch.
 
 commit 1d68712ba2e439f36874c4ed1e3d9ffec177a06c
 Note that faccessat doesn't handle ACLs when euid != uid, as
 this case is currently implemented by glibc instead of the kernel,
 using code similar to the existing dash test.

That faccessat bug is only true for current Linux kernels.  Cygwin
faccessat does the correct thing, even when euid != uid.

Thanks for applying this.

-- 
Don't work too hard, make some time for fun as well!

Eric Blake e...@byu.net
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bugs in cd

2009-08-31 Thread Eric Blake
According to Eric Blake on 7/14/2009 3:39 PM:
 For the cd command, POSIX 2008 requires that after all pathnames in CDPATH 
 have 
 been tested and failed in step 5, then step 6 interprets the directory 
 argument 
 relative to PWD.  In other words, this demonstrates a bug:
 
 $ dash -c 'cd /tmp; mkdir -p foo; CDPATH=oops; cd foo; echo $?; pwd'
 cd: 1: can't cd to foo
 2
 /tmp
 
 while bash gets it correct:
 
 $ bash -c 'cd /tmp; mkdir -p foo; CDPATH=oops; cd foo; echo $?; pwd'
 0
 /tmp/foo
 
 Furthermore, POSIX requires that if the element in CDPATH ends in slash, that 
 no additional slashes are added while forming the candidate curpath.  In 
 light 
 of the fact that //home need not be the same directory as /home (and indeed, 
 on 
 cygwin, they are distinct entities), this is also a bug:
 
 $ dash -c 'CDPATH=/; cd home'
 //home
 $ bash -c 'CDPATH=/; cd home'
 /home

Ping.

-- 
Don't work too hard, make some time for fun as well!

Eric Blake e...@byu.net
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: avoid compiler warning

2009-08-31 Thread Eric Blake
According to Herbert Xu on 8/11/2009 3:56 PM:
 On Tue, Aug 11, 2009 at 09:33:43AM -0700, H. Peter Anvin wrote:
 Herbert... the *type* is int, but the *value* has to be in the range
 [-1,UCHAR_MAX] or the behavior is undefined in both the C and POSIX
 standards.
 
 Good point.  I'll apply the patch.  I'd be very surprised though
 if this was the only instance in which we pass a char along.

Ping.  Or do we want to go with an alternate patch of defining our own
version of ISDIGIT that handles the entire range of int and avoids
checking the current locale, since POSIX guarantees that isdigit can only
return true for the ten bytes '0' through '9'?

-- 
Don't work too hard, make some time for fun as well!

Eric Blake e...@byu.net
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: avoid compiler warning

2009-08-11 Thread Eric Blake
According to Herbert Xu on 8/10/2009 10:03 PM:
 On Thu, Jul 09, 2009 at 12:55:25PM +, Eric Blake wrote:
 ccache gcc -DHAVE_CONFIG_H -I. -I..  -include ../config.h -DBSD=1 -DSHELL
 -DIFS_BROKEN  -Wall -gdwarf-2 -Wall -Werror -MT mystring.o -MD -MP -MF
 .deps/mystring.Tpo -c -o mystring.o mystring.c
 miscbltin.c: In function `umaskcmd':
 miscbltin.c:201: warning: subscript has type `char'

 isdigit is only defined over EOF and unsigned char values, so without this
 patch, you can trigger undefined behavior.
 
 What compiler and what libc was this? isdigit is supposed to
 be a function that takes an int argument according to POSIX.
 If libc implements it as a macro then it's up to it to cast
 the parameter to (int).

This is with recent newlib (the warning was intentionally added exactly to
catch the sort of bugs that my patch fixes), coupled with any version of
gcc 3.4 or newer.  Additionally, there is a pending bug report against
glibc requesting that glibc add the same QoI warning to flag potentially
buggy code, since it is quite easy to flag the use of raw char as a bug:
http://sources.redhat.com/bugzilla/show_bug.cgi?id=10296

In a particular demonstration of the bug, there are some locales where
isspace('\xff') is false but isspace((unsigned char)'\xff') [aka
isspace(0xff)] is true, when char is a signed type.  And although both
glibc and newlib cater to most instances of this bug (as a QoI
enhancement, these libraries guarantee that isspace('\xfe') [aka
isspace(-2)] returns the same result as isspace(0xfe)), not all platforms
have this QoI support, and can actually end up dereferencing outside of
array bounds.

Note that isdigit is a bit unique among the ctype macros: C89 and C99
state that it is locale dependent (with a range still limited to EOF or
[0-UCHAR_MAX]), but POSIX adds the additional restriction that it only
return true for the ten contiguous characters '0' through '9', meaning
that any POSIX-compliant version of isdigit(x) can be as simple as
((unsigned)(x)-'0'=9) for all x, without regards to locale or
out-of-range arguments.  But not all locales comply with POSIX, so it is
not generally portable to rely on isdigit being this simple or fast.  On
the other hand, there are a number of packages that #define ISDIGIT,
rather than use ctype's isdigit, exactly to get the speedup guaranteed by
POSIX; this may be something you want to do in dash.

-- 
Don't work too hard, make some time for fun as well!

Eric Blake e...@byu.net
--
To unsubscribe from this list: send the line unsubscribe dash in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html