Re: getopts appears to not be shifting $@ when consuming options

2021-01-29 Thread Jilles Tjoelker
On Fri, Jan 29, 2021 at 06:25:25PM +, earnestly wrote:
> In this example dash will repeatedly append 'attr=foo' to the list of
> parameters in an infinite loop:

> #!/bin/dash -x

> while getopts :a: arg -a foo -a bar; do
> case $arg in
> a) set -- "$@" attr="$OPTARG"
> esac
> done
> shift "$((OPTIND - 1))"

> Instead I expected this to result in parameter list containing
> 'attr=foo' and 'attr=bar'.

Hmm, that's pretty clever. By passing the parameters to be parsed to the
getopts command instead of via $@, it is possible to modify $@ without
violating POSIX's rule that only allows parsing a single unmodified set
of parameters (unless getopts is reset via OPTIND=1).

Unfortunately, dash and FreeBSD sh reset the getopts state when the
positional parameters are modified via set or shift. They probably do
this to avoid use after free and out of bounds memory access when a
script violates POSIX's rule. However, when the parameters come from the
getopts command, there is no violation of the rule and no memory
unsafety problem.

> This works in all shells I have been able to test with the exception of
> busybox sh:

> * sh   (bash)
> * bash (All versions from 1.14 through 5.1.4)
> * mksh (MIRBSD KSH R59 2020/05/16)
> * ksh  (93u+)
> * zsh  (5.8)
> * zsh --emulate sh
> * heirloom-sh (bourne)

> The only workaround I've found is to explicitly use `shift 2` in the a)
> case and obviate the final shift using OPTIND.  This will unfortunately
> break every other shell.

Since you need code to "un-eval" a list of parameters to construct the
getopts command above in a general case, a better workaround would be to
use that code to construct the attr="$OPTARG" list.

If your actual code is more like:

set -- -a foo -a bar # for testing
while getopts :a: arg; do
...

then the script violates the rule I mentioned, and has unspecified
results.

-- 
Jilles Tjoelker


Re: [PATCH] jobs: Block signals during tcsetpgrp

2021-01-06 Thread Jilles Tjoelker
On Wed, Jan 06, 2021 at 09:16:58PM +, Harald van Dijk wrote:
> On 06/01/2021 04:45, Herbert Xu wrote:
> > This patch implements the blocking of SIGTTOU (and everything else)
> > while we call tcsetpgrp.

> > Reported-by: Steffen Nurpmeso 
> > Signed-off-by: Herbert Xu 

> > diff --git a/src/jobs.c b/src/jobs.c
> > index 516786f..809f37c 100644
> > --- a/src/jobs.c
> > +++ b/src/jobs.c
> > @@ -1512,7 +1512,13 @@ showpipe(struct job *jp, struct output *out)
> >   STATIC void
> >   xtcsetpgrp(int fd, pid_t pgrp)
> >   {
> > -   if (tcsetpgrp(fd, pgrp))
> > +   int err;
> > +
> > +   sigblockall(NULL);
> > +   err = tcsetpgrp(fd, pgrp);
> > +   sigclearmask();
> > +
> > +   if (err)
> > sh_error("Cannot set tty process group (%s)", strerror(errno));
> >   }
> >   #endif

> While this is a step in the right direction, Jilles has already replied with
> an explanation of why this is not enough: if the terminal is in TOSTOP mode,
> it's not just tcsetpgrp() that needs to be handled, it's any write as well
> that may occur while the shell is not in the foreground process group. While
> it may be working according to design for messages written when the shell is
> not supposed to be in the foreground process group, it is another story when
> the shell is both responsible for taking itself out of the foreground
> process group and for writing a message. This is made worse by the fact that
> there is no synchronisation with child processes on errors, so even forcibly
> restoring the foreground process group may not be enough: unfortunate
> scheduling may result in a child process immediately setting the foreground
> process group to the child process after the parent process attempted to
> restore it to itself. I have not yet seen a good solution for this.

Comparing this error situation to the normal case, I think the right
solution is to close any stray pipe ends we have, wait for the remaining
child processes and only then report the error (throwing an exception as
normal). The child processes will probably terminate soon because of a
broken pipe, but even if they stop, they will not change the tty
foreground process group any more. The code in jobs.c will then reset
it.

The same error handling applies to the situation where pipe() fails.
This is a bit easier to test reliably, using ulimit -n.

Adding synchronization with the child processes slows down the normal
case for a rare error case, and the synchronization objects such as
pipe, eventfd, SysV semaphore or MAP_SHARED mapping might cause
unexpected issues in strange use cases.

A related bug: if fork fails for a command substitution, the pipe
created for reading the command output remains open (two descriptors).
This one is also in dash as well as FreeBSD sh. Throwing exceptions from
forkshell() may not be the best idea.

-- 
Jilles Tjoelker


Re: dash 0.5.11.2, busybox sh 1.32.0, FreeBSD 12.2 sh: spring TTOU but should not i think

2020-12-24 Thread Jilles Tjoelker
On Wed, Dec 23, 2020 at 08:18:24PM +, Harald van Dijk wrote:
> On 21/12/2020 16:24, Jilles Tjoelker wrote:
> > Also, simply entering the command
> >  trap "echo TTOU" TTOU
> > in an interactive shell makes it behave strangely. Created jobs
> > immediately stop, "fg" works once but after that the shell tends to get
> > stuck quickly.

> Good catch, there is some work to be done there too.

> > This seems a good approach, but certain writes to the terminal may need
> > the same treatment, for example the error message when fork fails for
> > the second and further elements of a pipeline. This also makes me wonder
> > why SIGTTOU is ignored at all by default.

> True. This is hard to create a reliable test case for, but there is only
> limited code that can get executed while a job-control-enabled shell may not
> be in the foreground process group.

An rlimit (ulimit -u) will cause fork to fail after a number of
processes, but will not work reliably if other code is executing
concurrently with the same UID.

> If fork fails halfway through a pipeline, it may also be a problem that some
> of the commands in the pipeline may still run.

This can be handled much like the case where an element in the pipeline
fails immediately such as because a utility cannot be found. I am not
sure how well this currently works.

> > In mksh, the issue is resolved slightly differently: setting a trap on
> > TTOU pretends to work but the signal disposition remains set to ignored.

> zsh also rejects traps on TTOU, but does so explicitly:

>   zsh: can't trap SIGTTOU in interactive shells

> Amusingly, it prints this in any shell where job control is enabled,
> regardless of whether the shell is interactive.

The rejection makes sense for any shell instance that has job control
and uses tcsetpgrp(), whether interactive or not.

I am not entirely happy with the rejection idea, though, since the check
can be bypassed by temporarily disabling job control:
set +m; trap 'echo TTOU' TTOU; set -m
and running external utilities then fails with:
zsh: can't set tty pgrp: interrupt

> > Tradition is for job control shells to be a process group leader, but I
> > don't really know why. Changing this will not fix the issue entirely
> > anyway since the shell must perform tcsetpgrp() from the background when
> > a foreground job has terminated.

> I've been thinking more about this, and I suspect it's a another conflation
> between interactive mode and job control. In an interactive shell that's not
> executing any external program, you want any Ctrl-C to only send SIGINT to
> that shell, not to any other process. In order for that to work, it needs to
> be its own process group.

> This should, in my opinion, *only* happen for interactive shells, not for
> job-control-enabled non-interactive shells. Consider

>   sh -c 'sh -mc "while :; do :; done"; echo bye'

> The behaviour I would want is that Ctrl-C kills the parent shell, so that
> this does not print "bye". Different shells behave differently.

I think the main effect of the -m option is that it places jobs in
separate process groups, which may also be useful in scripts. After
something like:

set -m
foo &
foopid=$!
set +m

it is possible to  kill -- "-$foopid"  .

Although non-portable kernel features such as cgroups (Linux) or reaper
(FreeBSD) might be more useful for tracking processes like this, the
work to define how to use them in a shell has not been done.

In FreeBSD sh, I extended the possibilities here by allowing job control
without an accessible tty in non-interactive mode. This applies both if
there is no controlling terminal at all and if the shell is in the
background.

In the example 
sh -c 'sh -mc "while :; do :; done"; echo bye'
the -m flag seems to have no effect since that shell only runs builtins.

Changing it to
sh -mc 'sh -c "while :; do :; done"; echo bye'
the desired Ctrl+C behaviour can still work because the outer shell
exits when it notices the inner shell has terminated because of SIGINT.

-- 
Jilles Tjoelker


Re: dash 0.5.11.2, busybox sh 1.32.0, FreeBSD 12.2 sh: spring TTOU but should not i think

2020-12-21 Thread Jilles Tjoelker
On Sat, Dec 19, 2020 at 11:52:31PM +, Harald van Dijk wrote:
> On 19/12/2020 22:21, Steffen Nurpmeso wrote:
> > Steffen Nurpmeso wrote in
> >   <20201219172838.1b-wb%stef...@sdaoden.eu>:
> >   |Long story short, after falsely accusing BSD make of not working

> > After dinner i shortened it a bit more, and attach it again, ok?
> > It is terrible, but now less redundant than before.
> > Sorry for being so terse, that problem crosses my head for about
> > a week, and i was totally mislead and if you bang your head
> > against the wall so many hours bugs or misbehaviours in a handful
> > of other programs is not the expected outcome.

> I think a minimal test case is simply

> all:
> $(SHELL) -c 'trap "echo TTOU" TTOU; set -m; echo all good'

> unless I accidentally oversimplified.

Yes, and it can be simplified a bit more to remove make from the
picture:
(SHELL -c 'trap "echo TTOU" TTOU; set -m; echo all good'; :)

The idea is to start the shell without being a process group leader, set
a trap for SIGTTOU and enable job control.

Also, simply entering the command
trap "echo TTOU" TTOU
in an interactive shell makes it behave strangely. Created jobs
immediately stop, "fg" works once but after that the shell tends to get
stuck quickly.

> The SIGTTOU is caused by setjobctl's xtcsetpgrp(fd, pgrp) call to make its
> newly started process group the foreground process group when job control is
> enabled, where xtcsetpgrp is a wrapper for tcsetpgrp. (That's in dash, the
> other variants may have some small differences.) tcsetpgrp has this little
> bit in its specification:

>Attempts to use tcsetpgrp() from a process which is a member of
>a background process group on a fildes associated with its con‐
>trolling  terminal  shall  cause the process group to be sent a
>SIGTTOU signal. If the calling thread is blocking SIGTTOU  sig‐
>nals  or  the  process is ignoring SIGTTOU signals, the process
>shall be allowed to perform the operation,  and  no  signal  is
>sent.

> Ordinarily, when job control is enabled, SIGTTOU is ignored. However, when a
> trap action is specified for SIGTTOU, the signal is not ignored, and there
> is no blocking in place either, so the tcsetpgrp() call is not allowed.

Right.

> The lowest impact change to make here, the one that otherwise preserves the
> existing shell behaviour, is to block signals before calling tcsetpgrp and
> unblocking them afterwards. This ensures SIGTTOU does not get raised here,
> but also ensures that if SIGTTOU is sent to the shell for another reason,
> there is no window where it gets silently ignored.

This seems a good approach, but certain writes to the terminal may need
the same treatment, for example the error message when fork fails for
the second and further elements of a pipeline. This also makes me wonder
why SIGTTOU is ignored at all by default.

In mksh, the issue is resolved slightly differently: setting a trap on
TTOU pretends to work but the signal disposition remains set to ignored.

> Another way to fix this is by not trying to make the shell start a new
> process group, or at least not make it the foreground process group. Most
> other shells appear to not try to do this.

Tradition is for job control shells to be a process group leader, but I
don't really know why. Changing this will not fix the issue entirely
anyway since the shell must perform tcsetpgrp() from the background when
a foreground job has terminated.

What is definitely required, though, is that the shell not read input or
alter terminal settings if it is started in the background (possibly
unless the script explicitly ignored SIGTTOU). This is what the loop
with tcgetpgrp() does.

-- 
Jilles Tjoelker


Re: shell: Always use explicit large file API

2020-05-09 Thread Jilles Tjoelker
On Thu, May 07, 2020 at 11:42:12PM +1000, Herbert Xu wrote:
> There are some remaining stat/readdir calls in dash that may lead
> to spurious EOVERFLOW errors on 32-bit platforms.  This patch changes
> them (as well as open(2)) to use the explicit large file API.

> Reported-by: Tatsuki Sugiura 
> Signed-off-by: Herbert Xu 

> diff --git a/configure.ac b/configure.ac
> index 5dab5aa..dbd97d8 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -144,8 +144,13 @@ AC_CHECK_FUNC(stat64,, [
>   AC_DEFINE(fstat64, fstat, [64-bit operations are the same as 32-bit])
>   AC_DEFINE(lstat64, lstat, [64-bit operations are the same as 32-bit])
>   AC_DEFINE(stat64, stat, [64-bit operations are the same as 32-bit])
> + AC_DEFINE(readdir64, readdir,
> +   [64-bit operations are the same as 32-bit])
> + AC_DEFINE(dirent64, dirent,
> +   [64-bit operations are the same as 32-bit])
>  ])
> [snip]

Is it possible to use AC_SYS_LARGEFILE and the normal
fstat/lstat/readdir/dirent/open/etc instead of the non-standard "64"
interfaces?

I understand that dash formerly used the "64" interfaces selectively, so
that the binary could be a bit smaller by avoiding 64-bit numbers where
they were not necessary. Now that the feature "supports files with inode
numbers that do not fit in 32 bits" is considered essential, this
complexity seems unnecessary.

I'm sorry for not providing a patch.

-- 
Jilles Tjoelker


Re: [v2 PATCH] eval: Reset handler when entering a subshell

2019-03-03 Thread Jilles Tjoelker
On Sun, Mar 03, 2019 at 04:43:09PM +, Harald van Dijk wrote:
> On 03/03/2019 13:57, Herbert Xu wrote:
> > On Sat, Mar 02, 2019 at 01:28:44PM +, Harald van Dijk wrote:

> > > That is *a* real bug here, not *the* real bug. This leaves the buggy
> > > behaviour of the "command" built-in intact in the test case I included in
> > > the message you replied to.

> > I don't quite understand.  Could you explain what is still buggy
> > after the following patch?

> I gave this example in my previous message:

>   command . /dev/stdin <   set -o invalid
>   echo a
>   EOF
>   echo b

> Ordinarily, "set -o invalid" is a "special built-in utility error", for
> which the consequence is that a non-interactive shell "shall exit". If it
> weren't a special built-in utility, it would be an "other utility (not a
> special built-in) error", for which a non-interactive shell "shall not
> exit".

> The effect of the command built-in is that "if the command_name is the same
> as the name of one of the special built-in utilities, the special properties
> in the enumerated list at the beginning of Special Built-In Utilities shall
> not occur." This is ambiguous as to whether it is just about the special
> properties associated with the . command, or whether it includes those
> associated with the set command called indirectly, but it must be one or the
> other. If it includes the set command, then the shell shall not exit, and
> the correct output is a followed by b. If it does not include the set
> command, then the shell shall exit, and the correct output is nothing. dash
> outputs b, which is wrong in either case.

The way I interpret this is that any error while parsing or executing
the . or eval command(s) is a "special built-in utility error" which can
be "caught" using "command". Therefore, the correct output is an error
message about the invalid option (to stderr) followed by b. This is what
happens in FreeBSD sh (for quite a few years), ksh93 (20120801_2) and
mksh (56c).

Given that the text is somewhat ambiguous, I think bash's POSIX mode
behaviour of exiting may also be valid.

I don't like disabling the special properties of the inner set command
since this creates a different kind of script for 'command .' versus
'.'. Normally (though not in #!/bin/bash scripts), if I write something
like 'set -o pipefail', I can safely assume that it will take if the
shell continues to the next command; if the shell does not support the
option, it exits.

The fairly old dash v0.5.9.1 I tried works even differently, writing an
error message (to stderr) followed by a. So it first continues after the
error and then aborts when '.' ends. This seems clearly broken (but I
think things have changed since back then).

-- 
Jilles Tjoelker


Re: [PATCH] main: don't raise exception when executing dotcmd() on a non-existent file

2018-11-24 Thread Jilles Tjoelker
On Sat, Nov 24, 2018 at 05:20:21PM +0100, Antonio Ospite wrote:
> When sourcing a file with the dotcmd() builtin, dash raises an exception
> when the input file cannot be opened.

> For instance the following script fails prematurely and does not output
> the message:

>   #!/bin/sh

>   . /non-existent-file
>   echo "Survived!"

> In this case, the behavior of dash is different from other shells (e.g.
> bash, zsh) which are more forgiving and allow execution to carry on even
> if the input file cannot be opened.

POSIX is unambiguous (XCU 2.14 Special Built-In Utilities -> dot) that a
non-interactive shell shall abort when a dot script is not found, and
bash and zsh comply to this when standards compliance is requested (e.g.
by naming the shell "sh" in argv[0] or using "set -o posix" in bash or
"set -o posixbuiltins" in zsh). Most other shells (e.g. mksh, FreeBSD
sh, yash) comply to POSIX unconditionally here, like dash currently
does.

> Fix this by passing the INPUT_NOFILE_OK flag when calling setinputfile()
> in dotcmd().

> As a bonus, this also fixes cases like the following:

>   set -e
>   . /non-existent-file || true
>   echo "Survived! Let's do something else..."

> This idiom is sometimes used in shell script to source a config file
> with default values only to provide fallback values if the default
> values were not available.

The above code is specific to bash and zsh non-POSIX modes, and will not
work in a #!/bin/sh script unless specific steps are taken (such as "set
+o posix" or starting the script with "bash script1" rather than
"./script1").

The simple solution is
  [ ! -f /non-existent-file ] || . /non-existent-file
Time-of-check-time-of-use issues should not be an issue for config
files.

Alternatively, one could try
  command . ./non-existent-file || true
but this may ignore more errors than desired (such as syntax errors in
the sourced file) and does not work correctly in yash 2.30.

-- 
Jilles Tjoelker


Re: [PATCH] Use a real non-cryptographic hash algorithm

2018-11-16 Thread Jilles Tjoelker
On Thu, Nov 15, 2018 at 09:27:42AM -0500, Devin Hussey wrote:
> >From 502cbd61e386eb014035df66b032e90a9de926b7 Mon Sep 17 00:00:00 2001
> From: Devin Hussey 
> Date: Thu, 15 Nov 2018 09:12:24 -0500
> Subject: [PATCH] Use a real non-cryptographic hash algorithm

> dash previously used a modified version of the LoseLose algorithm.

> LoseLose is a TERRIBLE algorithm.

> It has terrible distribution, leaving many hash entries unused.

> However, more importantly, it is incredibly easy to make a
> collision; even something as simple as rearranging the letters.

> For example. these all produce the hash 1652:

> Hello
> Holle
> Hlleo
> Hoell
> Haxed\n
> HASH\tBAD
> Hater

> Collsions in hash tables are very bad because it requires
> looking through a linked list, and the benefits of O(1) time
> in a hash table are completely lost.

> The FNV-1a algorithm has comparable performance and code size
> while having a much better collision rate.

You do have a dependency chain with a multiplication per character, so
it is less suitable for long strings. There are implementation
techniques to avoid the dependency chain, but I'm not sure it's worth
the trouble.

> While there are some other faster and more resistant hashes out there,
> namely xxHash, Murmur2, and CityHash, the benefits are minimal on
> short strings and they expect a length argument, unlike FNV-1a which
> simply uses null-termination, and they are not in the public domain
> like FNV-1a.

> [snip]
> +hashval = (hashval ^ (unsigned char)*p++) * FNV_PRIME;
> [snip]
> +hashval = (hashval ^ (unsigned char)*p++) * FNV_PRIME;
> [snip]
> +hashval = (hashval ^ (unsigned char) *p++) + FNV_PRIME;

I suppose this latter one should be * instead of + as well?

-- 
Jilles Tjoelker


Re: [PATCH 4/5] Stop using deprecated function sigsetmask()

2018-10-16 Thread Jilles Tjoelker
On Tue, Oct 16, 2018 at 06:42:19PM +0200, Antonio Ospite wrote:
> sigsetmask() is deprecated, at least on recent glibc; stop using it to
> silence the following compiler warning:

> ---
> system.h:40:2: warning: ‘sigsetmask’ is deprecated [-Wdeprecated-declarations]
>   sigsetmask(0);
>   ^~
> In file included from /usr/include/x86_64-linux-gnu/sys/param.h:28,
>  from shell.h:52,
>  from nodes.c:46:
> /usr/include/signal.h:173:12: note: declared here
>  extern int sigsetmask (int __mask) __THROW __attribute_deprecated__;
> ^~
> ---

> Using sigprocmask() and friends unconditionally should not be a problem,
> as commit e94a964 (eval: Add vfork support, 2018-05-19) also does it.

The git history starts (in 2005) after HAVE_SIGSETMASK was added, but I
expect it was done this way to save a few bytes in the executable. With
ProPolice, the effect may be a bit more since a local variable of type
sigset_t often contains an array and may cause functions to be compiled
with stack protection when they otherwise wouldn't (note that
sigclearmask() is inline).

Both FreeBSD and NetBSD simply changed the sigsetmask() call to a
sigprocmask() call very early on (1995-1996).

Personally, I think clean code that compiles without warnings is more
important than making the executable as small as possible, but the
maintainer may not agree.

-- 
Jilles Tjoelker


Re: [PATCH 1/6] exec: Don't execute binary files if execve() returned ENOEXEC.

2018-09-11 Thread Jilles Tjoelker
On Fri, Sep 07, 2018 at 10:34:09AM +0200, Andrej Shadura wrote:
> From: Adam Borowski 

> Both "dash -c foo" and "./foo" are supposed to be able to run hashbang-less
> scripts, but attempts to execute common binary files tend to be nasty:
> especially both ELF and PE tend to make dash create a bunch of files with
> unprintable names, that in turn confuse some tools up to causing data loss.

> Thus, let's read the first line and see if it looks like text.  This is a
> variant of the approach used by bash and zsh; mksh instead checks for
> signatures of a bunch of common file types.

> POSIX says: "If the executable file is not a text file, the shell may bypass
> this command execution."

I think this is worth the extra code, even though POSIX does not require
it.

Neither the file_is_binary function nor its caller take care of
preserving errno, which might lead to confusing error messages.

On another note, this is not fully POSIX-compliant because it rejects
some files that comply to POSIX's definition of a text file. Per POSIX's
definition, only files containing '\0' can be rejected (or containing
too long lines or not ending with a newline, both of which seem like a
bad idea to check); this probably requires checking more than the first
line for decent detection. However, I suppose the benefits and
disadvantages were weighed here and non-compliance was decided.

> Signed-off-by: Adam Borowski 
> Signed-off-by: Andrej Shadura 
> ---
>  src/exec.c | 32 
>  1 file changed, 32 insertions(+)
> 
> diff --git a/src/exec.c b/src/exec.c
> index 9d0215a..631 100644
> --- a/src/exec.c
> +++ b/src/exec.c
> @@ -148,6 +148,36 @@ shellexec(char **argv, const char *path, int idx)
>  }
>  
>  
> +/*
> + * Check if an executable that just failed with ENOEXEC shouldn't be
> + * considered a script (wrong-arch ELF/PE, junk accidentally set +x, etc).
> + * We check only the first line to allow binaries encapsulated in a shell
> + * script without proper quoting.  The first line, if not a hashbang, is
> + * likely to contain comments; even ancient encodings, at least popular
> + * ones, don't use 0x7f nor values below 0x1f other than whitespace (\t,
> + * \n, \v, \f, \r), ISO/IEC 2022 can have SI, SO and \e.
> + */
> +STATIC int file_is_binary(const char *cmd)
> +{
> + char buf[128];
> + int fd = open(cmd, O_RDONLY|O_NOCTTY);
> + if (fd == -1)
> + return 1;
> + int len = read(fd, buf, sizeof(buf));
> + close(fd);
> + for (int i = 0; i < len; ++i) {
> + char c = buf[i];
> + if (c >= 0 && c <= 8 ||
> + c >= 16 && c <= 31 && c != 27 ||
> + c == 0x7f)
> + return 1;
> + if (c == '\n')
> + return 0;
> + }
> + return 0;
> +}
> +
> +
>  STATIC void
>  tryexec(char *cmd, char **argv, char **envp)
>  {
> @@ -162,6 +192,8 @@ repeat:
>   execve(cmd, argv, envp);
>  #endif
>   if (cmd != path_bshell && errno == ENOEXEC) {
> + if (file_is_binary(cmd))
> + return;
>   *argv-- = cmd;
>   *argv = cmd = path_bshell;
>   goto repeat;

-- 
Jilles Tjoelker


Re: [v2 PATCH] expand: Escape minus sign in arithmetic expansion

2018-05-31 Thread Jilles Tjoelker
On Tue, May 29, 2018 at 02:00:26AM +0800, Herbert Xu wrote:
> On Mon, May 28, 2018 at 12:22:00AM +0800, Herbert Xu wrote:
> > The minus sign generated from arithmetic expansion is currently
> > unquoted which causes anomalies when the result is used in where
> > the quoting matters.

> > This patch fixes it by explicitly calling memtodest for the minus
> > sign.

> > Signed-off-by: Herbert Xu 

> This was buggy.  Here is an update.

> ---8<---
> The minus sign generated from arithmetic expansion is currently
> unquoted which causes anomalies when the result is used in where
> the quoting matters.

> This patch fixes it by explicitly calling memtodest for the minus
> sign.

> Signed-off-by: Herbert Xu 

> diff --git a/src/expand.c b/src/expand.c
> index 7a51766..7ed1bc0 100644
> --- a/src/expand.c
> +++ b/src/expand.c
> @@ -490,7 +490,14 @@ expari(int flag)
>   result = arith(p + 1);
>   popstackmark(&sm);
>  
> - len = cvtnum(result);
> + len = 0;
> + if (result < 0) {
> + memtodest("-", 1, flag);
> + result = -result;

This negation overflows if result is INTMAX_MIN, leading to undefined
behaviour. Perhaps leave the memtodest call but use STADJUST to remove
the minus sign itself, leaving only the CTLESC or nothing.

> + len++;
> + }
> +
> + len += cvtnum(result);
>  
>   if (likely(!(flag & EXP_QUOTED)))
>   recordregion(begoff, begoff + len, 0);

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: exec: Stricter pathopt parsing

2018-05-15 Thread Jilles Tjoelker
On Tue, May 15, 2018 at 04:49:51PM +0800, Herbert Xu wrote:
> This patch changes the parsing of pathopt.  First of all only
> %builtin and %func (with arbitrary suffixes) will be recognised.
> Any other pathopt will be treated as a normal directory.

> Furthermore, pathopt can now be specified before the directory,
> rather than after it.  In fact, a future version may remove support
> for pathopt suffixes.

> This is so that it is less likely that a genuine directory containing
> a % sign is parsed as a pathopt.

Some examples/testcases would make this clearer.

Note that this also affects CDPATH (good) and MAILPATH (breaks its
undocumented feature to replace the "you have mail" text, which seems
unimportant).

> The patch also removes the clearcmdentry optimisation where we
> attempt to only partially flush the table where possible.

That optimisation is not only a waste of code but also not
POSIX-compliant. XCU 2.9.1.1 Command Search and Execution is clear that
the shell is required to forget remembered locations after an assignment
to PATH, and non-normative text with the hash utility remarks that
PATH="$PATH" is an alternative to hash -r that does not need the XSI
option.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: builtin: Mark more regular built-ins

2018-05-15 Thread Jilles Tjoelker
On Tue, May 15, 2018 at 07:27:29PM +0800, Herbert Xu wrote:
> This patch marks the following built-ins as regular, meaning that
> they cannot be overriden using PATH search:

>   hash
>   pwd
>   type
>   ulimit

This seems correct, but lacks rationale. The rationale is that pwd
should have been here since long ago (because it is in the list in XCU
2.9.1.1 Command Search and Execution), while the other three are new in
SUSv4tc2.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression in dash 0.5.10 related to subshells

2018-05-06 Thread Jilles Tjoelker
On Sun, May 06, 2018 at 12:02:43AM +0800, Herbert Xu wrote:
> Subject: jobs - Do not block when waiting on SIGCHLD

> Because of the nature of SIGCHLD, the process may have already been
> waited on and therefore we must be prepared for the case that wait
> may block.  So ensure that it doesn't by using WNOHANG.

> Fixes: 03876c0743a5 ("eval: Reap zombies after built-in...")
> Signed-off-by: Herbert Xu 

> diff --git a/src/jobs.c b/src/jobs.c
> index 1a97c54..6dc555f 100644
> --- a/src/jobs.c
> +++ b/src/jobs.c
> @@ -975,8 +975,8 @@ waitforjob(struct job *jp)
>   int st;
>  
>   TRACE(("waitforjob(%%%d) called\n", jp ? jobno(jp) : 0));
> - while ((jp && jp->state == JOBRUNNING) || gotsigchld)
> - dowait(DOWAIT_BLOCK, jp);
> + while (jp ? jp->state == JOBRUNNING : gotsigchld)
> + dowait(jp ? DOWAIT_BLOCK : DOWAIT_NORMAL, jp);
>   if (!jp)
>   return exitstatus;
>   st = getstatus(jp);

Now each of the first four executable lines of waitforjob() does
something different for jp == NULL and jp != NULL. It probably makes
more sense to separate the jp == NULL case into a new function.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expand: Do not quote backslashes in unquoted parameter expansion

2018-03-27 Thread Jilles Tjoelker
On Tue, Mar 27, 2018 at 08:19:19PM +0800, Herbert Xu wrote:
> On Tue, Mar 27, 2018 at 01:07:21PM +0200, Harald van Dijk wrote:

> > If POSIX specifies a result, and a shell applies an optimisation that causes
> > a different result to be produced, doesn't that inherently disallow that
> > optimisation? There may be some confusion and/or disagreement over what
> > exactly POSIX specifies and/or intends to specify, but I don't see how you
> > can argue that POSIX specifies result A, but it's okay to apply an
> > optimisation that produces result B.

> Wait, you're talking about something completely different.  The
> crux of the issue is whether POSIX allows the backslash to escape
> the character after it or does it have to be a literal backslash.

> In fact POSIX is perfectly clear on this:

>   2.13.1 Patterns Matching a Single Character

>   The following patterns matching a single character shall match a
>   single character: ordinary characters, special pattern characters,
>   and pattern bracket expressions. The pattern bracket expression
>   also shall match a single collating element. A 
>   character shall escape the following character. The escaping
>shall be discarded. If a pattern ends with an
>   unescaped , it is unspecified whether the pattern
>   does not match anything or the pattern is treated as invalid.

> Nowhere does it say that backslashes that come from the result of
> parameter expansion is always literal.

> So it's clear that any shell that treats the backslash as a literal
> backslash is already broken.

I don't think it is clear at all. Note the final paragraph of 2.13.1:

] When pattern matching is used where shell quote removal is not
] performed (such as in the argument to the find -name primary when find
] is being called using one of the exec functions as defined in the
] System Interfaces volume of POSIX.1-2008, or in the pattern argument
] to the fnmatch() function), special characters can be escaped to
] remove their special meaning by preceding them with a 
] character. This escaping  is discarded. The sequence "\\"
] represents one literal . All of the requirements and
] effects of quoting on ordinary, shell special, and special pattern
] characters shall apply to escaping in this context.

This implies that special characters cannot be escaped using a backslash
in a context where shell quote removal is performed. Instead, special
characters can be escaped using shell quoting. As a result, the simplest
form of parameter expansion has either all or none of the generated
characters quoted (depending on whether the expansion is in
double-quotes or not).

There is also a sentence "The shell special characters always require
quoting." which seems untrue in practice if the characters come from an
expansion. Something like

  touch 'a&b'
  sh -c 'w=*\&*; printf "%s\n" $w'

works for many shells as sh. However, this could be explained away as
undefined behaviour.

Furthermore, POSIX generally intends to specify the behaviour of the
Bourne shell and ksh88. If the Bourne shell and ksh88 both agree, and do
not match an interpretation of POSIX, then it is likely that the
interpretation is wrong or the standard text has a bug.

> > Can you show me any shell other than bash that lets this
> > optimisation affect the results?

> The fact is that the other shells are not doing what they do because
> they're not doing this optimisation.  They do what they do because
> they have broken POSIX because they always treat backslashes that
> arise from parameter expansion as literal backslashes.

Let's use clear terminology: if it affects behaviour, it is by
definition not an optimization. It is a special case for words not
containing special pattern characters (for some value of "special
pattern characters").

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] $PATH not fully searched

2018-01-10 Thread Jilles Tjoelker
On Wed, Jan 10, 2018 at 01:36:18PM -0500, Joshua Nelson wrote:
> I've come across an error with the PATH variable in `dash`. Instead of
> fully searching PATH for commands, dash will respond 'command not
> found' if `command -p ` fails.  The same command works
> fine in Bash.

> The following example was performed in a live cd of Debian 9.1 Stretch
> (run in a virtual machine), with dash version 0.5.8-2.4:

> ```bash
> user@debian:~$ PATH="~/my_bin:$PATH" dash
> $ echo $PATH
> ~/my_bin:/usr/local/bin:/usr/bin:/usr/local/games:/usr/games
> $ ls -l ~/my_bin/list
> -rwxr--r-- 1 user user 18 Jan 10 17:42 /home/user/my_bin/list
> $ list
> dash: 3: list: not found
> $ exit
> user@debian:~$ PATH="$PATH:~/my_bin" list
> Desktop Documents Downloads Music my_bin Pictures Public Templates Videos
> ```

> I believe but am not certain that this is related to the following patch:
> https://www.mail-archive.com/dash@vger.kernel.org/msg01329.html

In your example, the tilde is quoted and ends up literally in PATH.
Then, in bash only, tilde expansion is attempted again during the
search. In my testing, zsh, mksh, ksh93, dash and FreeBSD sh do not
implement this feature. Enabling POSIX mode in bash also disables the
feature.

What works more portably is
  PATH=~/my_bin:$PATH dash
or
  PATH=$PATH:~/my_bin list
which expands the tilde at the time of the assignment. This works pretty
much everywhere except in old real Bourne shells, which do not implement
tilde expansion at all.

If you want to quote the $PATH part, it is possible but not necessary
since it is in a variable assignment which is not subject to pathname
generation and word splitting anyway. However, if "export" is prepended,
I strongly recommend quoting it since some shells such as dash do not
implement the special rule for assignment utilities yet.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] quoted substring parameter expansion ignores single-quotes in the arg

2017-10-21 Thread Jilles Tjoelker
On Thu, Oct 19, 2017 at 12:41:30PM +0200, G.raud wrote:
> Wrong bug report.

> In fact the beahviour of dash on "a=\\'a\\'; echo \"\${a%'}\"" is that
> of POSIX and of zsh 5.  However 'bash --posix', 'mksh -o posix' and
> pdksh fail to parse the command and ksh does not remove the quote from
> $a.

I think your original bug report is actually valid. Although the POSIX
standard may be hard to interpret (make sure to use the latest (2016)
edition), I think it is sufficiently clear that various special
characters are active in ${param#word}, whether the outer substitution
is within double-quotes or not. See this sentence in 2.6.2 Parameter
Expansion:

] Enclosing the full parameter expansion string in double-quotes shall
] not cause the following four varieties of pattern characters to be
] quoted, whereas quoting characters within the braces shall have this
] effect.

More generally, if bash --posix and ksh93 agree about something, it is
usually correct (use something like "${a#'*'}", since "${a#'}" is
wrong). Although zsh is a good interactive shell, it does not follow
POSIX as closely; not even in sh or ksh emulation mode.

Also note that dash does implement this correctly if a metacharacter is
quoted using a single-quote or a backslash:

$ v=**
$ echo "${v#*}"
**
$ echo "${v#"*"}"
*
$ echo "${v#\*}"
*

In dash's internals, this bug is caused by not distinguishing between
+/-/=/? and #/##/%/%% varieties in the parser. Not maintaining state for
eacg nesting level makes the parser more elegant, but it is untenable
since there is no way one can parse both "${v+'}" and "${v#'*'}"
correctly without it.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Small cleanup to src/dash.1 - Command Line Editing

2017-06-23 Thread Jilles Tjoelker
On Sat, Jun 17, 2017 at 03:53:26PM +0100, Larry Hynes wrote:
> src/dash.1, under Command Line Editing, states:

>   It's similar to vi: typing  will throw you into command
>   VI command mode.

> - There appears to be no need for both occurrences of 'command'
> - I can't see a reason for VI to be capitalised
> - 'will throw you into' seems a little... enthusiastic

> Following diff changes it to

>   It's similar to vi: typing  enters vi command mode.

> diff --git a/src/dash.1 b/src/dash.1
> index 8b8026d..f35d89d 100644
> --- a/src/dash.1
> +++ b/src/dash.1
> @@ -2232,7 +2232,7 @@ enabled, sh can be switched between insert mode and 
> command mode.
>  The editor is not described in full here, but will be in a later document.
>  It's similar to vi: typing
>  .Aq ESC
> -will throw you into command VI command mode.
> +enters vi command mode.
>  Hitting
>  .Aq return
>  while in command mode will pass the line to the shell.

I agree. If you're changing things here anyway, I suggest getting rid of
the contraction as well (changing It's to It is). The fairly formal
style of man pages avoids contractions, just like it avoids "you".

The reference to the "later document" can probably be removed as well,
since said document does not exist yet after many years.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Don't execute binary files if execve() returned ENOEXEC.

2017-02-08 Thread Jilles Tjoelker
On Wed, Feb 08, 2017 at 09:02:33AM +0100, Adam Borowski wrote:
> On Tue, Feb 07, 2017 at 11:52:08PM +0100, Jilles Tjoelker wrote:
> > On Tue, Feb 07, 2017 at 09:33:07AM +0100, Adam Borowski wrote:
> > > Both "dash -c foo" and "./foo" are supposed to be able to run 
> > > hashbang-less
> > > scripts, but attempts to execute common binary files tend to be nasty:
> > > especially both ELF and PE tend to make dash create a bunch of files with
> > > unprintable names, that in turn confuse some tools up to causing data 
> > > loss.

> > > Thus, let's read the first line and see if it looks like text.  This is a
> > > variant of the approach used by bash and zsh; mksh instead checks for
> > > signatures of a bunch of common file types.

> > > POSIX says: "If the executable file is not a text file, the shell may 
> > > bypass
> > > this command execution.".

> > In FreeBSD sh, I have done this slightly differently (since 2011), based
> > on POSIX's definition of a text file in XBD 3:

> > ] A file that contains characters organized into zero or more lines. The
> > ] lines do not contain NUL characters and none can exceed {LINE_MAX} bytes
> > ] in length, including the  character.

> Using that definition would require reading the whole file, which can
> obviously be slow and/or lead to DoS.

> Also, it would reject some scripts currently accepted by popular shells,
> including bash, mksh and present version of dash -- none of which ban
> lines above LINE_MAX (2048) in length.

The part you quoted has does not require that the shell bypass the
execution (of the shell) for any file that is not a text file, but that
the shell not bypass the execution for any file that is a text file.

Therefore, the shell may (but is not required to) bypass the execution
if the file contains 0 bytes, contains invalid byte sequences that do
not form a character, contains lines longer than {LINE_MAX} or ends with
a character that is not a newline. Otherwise, the shell must perform the
execution.

The line length is indeed somewhat strange: the INPUT FILES section of
XCU 4 Utilities -> sh requires the shell to read scripts without the
{LINE_MAX} limit, but the part you quoted does not use this modified
definition of a text file. In practice this is not relevant since shells
are not going to read more than {LINE_MAX} for this check anyway, as you
say.

> > The check is simply for a 0 byte in the first 256 bytes (how many bytes
> > are read by pread() for 256 bytes). A file containing the byte 8, for
> > example, can still be a text file per POSIX's definition.

> The XBD 3 cannot possibly specify allowed byte values, as it doesn't even
> know which is the newline, nor does it require even its own "portable
> character set" being directly accessible (merely that it's included in one
> of shiftable sets provided by a locale).

> On the other hand, not only dash assumes 8-bit bytes and ASCII-compatible
> charset (what a limitation!), but we also have knowledge about which byte
> values are not a part of any printable string in any locale on a supported
> platform.  And the set of supported character sets is going to only
> decrease[1].

Text files need not contain printable characters only.

> > This check might cause a terse script with binary to fail to execute,
> > but I have not received bug reports about that.

> In today's brave "curl|sudo sh" world, it's a concern I wouldn't dismiss.
> The first line will be a legitimate command to the shell, the rest is often
> arbitrary binary junk.

I think problems are unlikely because most scripts use the #! method
instead of relying on this shell feature.

> > Stopping the check with a \n will cause a PNG header to be considered
> > text.

> Good point.  It does fool bash and mksh (with mksh's different approach)
> too.  On the other hand, PNG files are not something that's likely to have a
> bogus +x permission, and unlike PE or ELF files, empirically I didn't notice
> any nasty results.  It's still bad -- not sure what's the best solution.

Anything can have a bogus +x permission on FAT, NTFS and other
filesystems that do not support executable permissions at all.

> > Also, FreeBSD sh uses O_NONBLOCK when opening the file and pread() to
> > read the data, in order to minimize the potential for modifying things
> > by reading.

> The manpage for open(2) on Linux says that, while currently O_NONBLOCK has
> no effect for regular files and block devices, the expected semantics might
> eventually be implemented -- and I guess you're not going to prefault the
> beginning of the file before reading,

Re: [PATCH] Don't execute binary files if execve() returned ENOEXEC.

2017-02-07 Thread Jilles Tjoelker
On Tue, Feb 07, 2017 at 09:33:07AM +0100, Adam Borowski wrote:
> Both "dash -c foo" and "./foo" are supposed to be able to run hashbang-less
> scripts, but attempts to execute common binary files tend to be nasty:
> especially both ELF and PE tend to make dash create a bunch of files with
> unprintable names, that in turn confuse some tools up to causing data loss.

> Thus, let's read the first line and see if it looks like text.  This is a
> variant of the approach used by bash and zsh; mksh instead checks for
> signatures of a bunch of common file types.

> POSIX says: "If the executable file is not a text file, the shell may bypass
> this command execution.".

> Signed-off-by: Adam Borowski 
> ---
> This has been applied in Debian.  While technically it's only a "may" issue
> in dash itself, and is triggered by user error (trying to exec files you
> shouldn't exec), the fallout is nasty enough that the bug was classified as
> serious.

> The usual failure mode is to create files with names such as:
> (per submitter, a PE):
> 90 d4 f6
> (an ELF):
> 01 b0 40 40 08 01 40 38 02 40 04 03 01 05 40 40 ed ed
> 01 b0 40 40 f8 40 38 02 40 04 03 01 05 40 40 da da

In FreeBSD sh, I have done this slightly differently (since 2011), based
on POSIX's definition of a text file in XBD 3:

] A file that contains characters organized into zero or more lines. The
] lines do not contain NUL characters and none can exceed {LINE_MAX} bytes
] in length, including the  character.

The check is simply for a 0 byte in the first 256 bytes (how many bytes
are read by pread() for 256 bytes). A file containing the byte 8, for
example, can still be a text file per POSIX's definition.

This check might cause a terse script with binary to fail to execute,
but I have not received bug reports about that.

Stopping the check with a \n will cause a PNG header to be considered
text.

Also, FreeBSD sh uses O_NONBLOCK when opening the file and pread() to
read the data, in order to minimize the potential for modifying things
by reading.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: why does dash save, dup, and restore redirected descriptor in the parent, rather than redirect in the child?

2017-02-03 Thread Jilles Tjoelker
On Tue, Jan 31, 2017 at 06:27:00PM -0800, Parke wrote:
> On Tue, Jan 31, 2017 at 12:00 PM, Mark Galeck  wrote:
> > This is strange. Why save, dup and dup again to restore, descriptors
> > in the parent, when it would be much simpler to just dup in the
> > child, and not have to save and restore. This is simpler and I
> > checked it works the same:

> > I am sure there must be a good reason and I am not understanding
> > something deeper. What is it?

> I am not a dash developer, but one reason to make system calls in the
> parent is that it is much simpler to handle errors in the parent.

> In your example:

> > if (!fork()) {
> > fd = open64("foobar.txt", O_WRONLY|O_CREAT);
> > dup2(fd, 1);
> > execl("/bin/date", "date", (char *)NULL);
> > }

> What happens if open64 fails?  How should the child inform the parent
> of this specific error?

In general, you are right, but in the shell's case there is no
difficulty. The error is reported via an error message to stderr and a
non-zero exit status of the command, and the child process can easily do
those things.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] trap: fix memory leak in exitshell()

2016-11-22 Thread Jilles Tjoelker
On Mon, Nov 21, 2016 at 10:40:52PM +0100, Andreas Bofjall wrote:
> After dash had executed the exit trap handler, the trap was reset but
> the pointer was never freed. This leak can be demonstrated by running
> dash through valgrind and executing the following shell script:

>   foo() {
>   true
>   }
>   trap foo EXIT

> Fix by properly freeing the trap pointer in exitshell().

> Signed-off-by: Andreas Bofjall 
> ---
>  src/trap.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/trap.c b/src/trap.c
> index edb9938..5418b07 100644
> --- a/src/trap.c
> +++ b/src/trap.c
> @@ -389,6 +389,7 @@ exitshell(void)
>   trap[0] = NULL;
>   evalskip = 0;
>   evalstring(p, 0);
> + ckfree(p);
>   }
>  out:
>   /*

This patch will shut up valgrind in the common case, but does not handle
the general case. The command string may contain an error or invoke the
exit builtin and in either case the command string will be leaked
(SIGINT might be expected to have a similar effect, but behaves
strangely from an EXIT trap in dash).

You can probably use the exception handling already present in the
function to fix this. Note that ckfree() should only be used while
INTOFF is in effect, both to avoid longjmp'ing out of free() and to
ensure exactly one free in the presence of interruptions and errors.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug? "fstat64(f, &sb) < 0 && S_ISREG(sb.st_mode)"

2016-10-26 Thread Jilles Tjoelker
On Tue, Oct 25, 2016 at 12:13:51PM -0500, Eduardo Bustamante wrote:
> I see the change was introduced here
> http://git.kernel.org/cgit/utils/dash/dash.git/commit/?id=f78674ed6f95b594dcac0e96d6a76c5f64aa2cbf,
> by importing code from FreeBSD's sh.

> What I find interesting is testing the contents of 'sb', when the
> fstat64 call failed. FreeBSD's code has the opposite:
> https://github.com/freebsd/freebsd/blob/master/bin/sh/redir.c#L200,
> i.e. they do:

> if (fstat(f, &sb) != -1 && S_ISREG(sb.st_mode)) {

> BTW, that code was introduced here:
> https://github.com/freebsd/freebsd/commit/e3cb9d1015e8283f4f9d116cf50f510741f8c0dd

The FreeBSD code is how it was intended to be. I don't know how the dash
bug was introduced. The bug only appears when a parallel actor replaces
an openable non-regular file (such as a symlink to /dev/null) with a
regular file and may cause corruption of the file.

As noted on austin-group-l recently, the code does suffer from an issue
where a noclobber open may fail if a parallel actor deletes a
non-regular file (stat reports non-regular file, open fails). Many other
shells have this issue, though.

> Also, the way the open is performed has an interesting consequence,
> i.e. the redirection operation on a FIFO blocks until something is
> written to the FIFO:

> hp% dash -Cc 'rm ff; mkfifo ff; echo a > ff'
> ^Cdash: 1: cannot create ff: Interrupted system call

> It seems to be the same case for mksh, ksh93, posh, pdksh. Not that it
> matters though.

Blocking is intended behaviour when opening a fifo without a
counterparty. It can be used to good effect for synchronizing processes,
sometimes even without transferring any data.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash tested against ash testsuite: 17 failures

2016-10-10 Thread Jilles Tjoelker
the tables of signal names in the  header defined in XBD
> Headers; for example, HUP, INT, QUIT, TERM. Implementations may permit
> names with the SIG prefix or ignore case in signal names as an extension."

> > ash-signals/signal4.tests
> > trap "echo Trapped" BADNAME TERM - abort seeing BADNAME.
> > bash complains, but sets the trap for the second signal.

> I don't see how this is a bug.

POSIX is not entirely clear but there is some text in XCU 1.4 Utility
Description Defaults CONSEQUENCES OF ERRORS that suggests subsequent
operands should continue to be processed after an erroring operand,
which is almost ubiquitously followed by utilities such as cp, mv, rm,
find and ls. However, the same section also says that "Default" means
any changes to the process state are unspecified.

In FreeBSD sh (SVN r199641), I considered it most sensible to continue
processing subsequent signal names (returning exit status 1 later).

> > ash-signals/signal8.tests
> > "kill %1" in non-interactive shell does not find
> > previously backgrounded task.

> Not a bug. '%1' is a job control job ID, but in a non-interactive shell
> there's no job control. Either 'set -m' to turn on job control first
> (and deal with extra terminal clutter), or just use "$!" to kill by PID.

In my reading of POSIX, there is nothing that makes the %job notation
depend on 'set -m'. However, the description of %job operands in kill
requires a signal to be sent to a background process group, which does
not exist if 'set -m' was not in effect when the job was created.

An obvious generalization is to send the signal to each process in the
job if 'set -m' was not in effect when it was created.

The %1 syntax avoids process ID reuse issues but unfortunately it cannot
usually be used if there are multiple background jobs since it may be
unclear when a job number becomes available for reuse.

> > ash-vars/var-utf8-length.tests
> > ${#VAR} counts unicode chars in bash

> dash does not support unicode; AFAIK, this is a design choice (I'm not a
> developer).

That may be but it does cause a non-compliance when the entire system
wants to support UTF-8.

> > ash-vars/var_unbackslash.tests
> > b=-$a-\t-\\-\"-\`-\--\z-\*-\?-
> > b="-$a-\t-\\-\"-\`-\--\z-\*-\?-"
> > b='-$a-\t-\\-\"-\`-\--\z-\*-\?-'
> > b1=$b
> > "b1=$b"
> > You can imagine... not everything went well here...

> Again, this is probably down to legitimate differences in the
> notoriously unportable "echo" builtin. Test this using printf instead.

Seems like it.

> > ash-vars/var_unbackslash.tests

> ITYM ash-vars/var_unbackslash1.tests

> > echo Forty two:$\
> > (\
> > (\
> > 42\
> > )\
> > )
> > dash says: Syntax error: Missing '))'

> Yes, but it's not clear to me that it shouldn't.

> Hmm... maybe this is indeed a bug:
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02_01
> "A  that is not quoted shall preserve the literal value of
> the following character, with the exception of a . If a
>  follows the , the shell shall interpret this as
> line continuation. The  and  shall be removed before
> splitting the input into tokens. Since the escaped  is removed
> entirely from the input and is not replaced by any white space, it
> cannot serve as a token separator."

> So, unless I'm misreading this, it looks like backslashes need to be
> parsed before *any* other kind of lexical analysis.

Yes, for  sequences that are not quoted.

For example,  contains
two characters between the quotes, not zero.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] regression in builtin echo

2016-09-02 Thread Jilles Tjoelker
On Fri, Sep 02, 2016 at 10:19:12PM +0800, Herbert Xu wrote:
> On Fri, Sep 02, 2016 at 04:16:31PM +0200, Jilles Tjoelker wrote:

> > Unlike Harald van Dijk's patch, the above patch breaks \c. Per POSIX
> > (XSI option), \c shall cause all characters following it in the
> > arguments to be ignored (so not only in the argument where \c occurs).
> > For example:
> >   echo 'a\cb' c; echo d
> > shall write "ad" followed by a newline.

> Works for me:

> $ build/src/dash -c "echo 'a\cb' c; echo d"
> ad
> $ 

> AFAICS my patch doesn't change \c handling at all.  When we hit
> \c print_escape_str will return 0x100, which guarantees that we
> hit the berak.

You are right. The code is very tricky and my analysis without running
the code was wrong.

I think developing a shell is hard enough already without making the
code so tricky, though.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Parameter expansion, patterns and fnmatch

2016-09-02 Thread Jilles Tjoelker
On Fri, Sep 02, 2016 at 10:04:37PM +0800, Herbert Xu wrote:
> Harald van Dijk  wrote:
> > Yes, this looks like a bug in dash. With the default --disable-fnmatch 
> > code, when dash encounters [ in a pattern, it immediately treats the 
> > following characters as part of the set. If it then encounters the end 
> > of the pattern without having seen a matching ], it attempts to reset 
> > the state and continue as if [ was treated as a literal character right 
> > from the start. The attempt to reset the state doesn't look right, and 
> > has been like this since at least the initial Git commit in 2005.

> pdksh exhibits the same behaviour:

> $ pdksh -c 'foo=[abc]; echo ${foo#[}'
> [abc]
> $

> POSIX says:

> 9.3.3 BRE Special Characters

> A BRE special character has special properties in certain contexts.
> Outside those contexts, or when preceded by a backslash, such a
> character is a BRE that matches the special character itself. The
> BRE special characters and the contexts in which they have their
> special meaning are as follows:

> .[\
> The period, left-bracket, and backslash shall be special except
> when used in a bracket expression (see RE Bracket Expression). An
> expression containing a '[' that is not preceded by a backslash
> and is not part of a bracket expression produces undefined results.

I think this interpretation of POSIX is incorrect. This is about shell
patterns, not basic regular expressions. Shell patterns are specified in
XCU 2.13 Pattern Matching Notation. In XCU 2.13.1, it is written:

] [
] If an open bracket introduces a bracket expression as in XBD Section
] 9.3.5, except that the  character ('!') shall
] replace the  character ('^') in its role in a non-matching
] list in the regular expression notation, it shall introduce a pattern
] bracket expression. A bracket expression starting with an unquoted
]  character produces unspecified results. Otherwise, '['
] shall match the character itself.

Therefore, pdksh is wrong and the output should be abc].

It is normally better to test against the actively developed mksh
instead of pdksh, but here mksh has the same bug. OpenBSD's ksh also has
some active development but stays closer to the original pdksh.

> > This also affects

> > case [a in [?) echo ok ;; *) echo bad ;; esac

> > which should print ok.

> Even ksh prints bad here.

I think POSIX may be saying something different here from what it really
wants to say. There is text in 2.13.3 Patterns Used for Filename
Expansion that leaves unspecified whether [? matches only the literal
filename component [? or all two-character filename components starting
with [ (other slash-separated components in the same pattern are
unaffected). However, if ksh93 behaves similarly in a case statement,
that may have been what the standard had intended to say.

Looking at as simple code as possible, this seems, however, unhelpful.
Since a pattern like *[ should match the literal string *[ in the choice
where brackets that do not introduce a bracket expression are supposed
to disable other special characters and any earlier work on the * is
therefore wrong, implementing this choice requires an additional scan
for brackets that do not introduce a bracket expression.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] regression in builtin echo

2016-09-02 Thread Jilles Tjoelker
On Fri, Sep 02, 2016 at 09:14:39PM +0800, Herbert Xu wrote:
> Harald van Dijk  wrote:

> > While the original code implementing the echo command was overly 
> > complicated, the simplified version does not do the right thing, as you 
> > noticed.

> Indeed.

> However, we don't need to rewrite the function to fix this.

> ---8<---
> Subject: builtin: Fix echo -n early termination

> The commit 7a784244625d5489c0fc779201c349555dc5f8bc ("[BUILTIN]
> Simplify echo command") broke echo -n by making it always terminate
> after printing the first argument.

> This patch fixes this by only terminating when we have reached
> the end of the arguments.

> Fixes: 7a784244625d ("[BUILTIN] Simplify echo command")
> Reported-by: Luigi Tarenga 
> Signed-off-by: Herbert Xu 

> diff --git a/src/bltin/printf.c b/src/bltin/printf.c
> index 1112253..a626cee 100644
> --- a/src/bltin/printf.c
> +++ b/src/bltin/printf.c
> @@ -459,7 +459,7 @@ echocmd(int argc, char **argv)
>  
>   if (likely(*argv))
>   nonl += print_escape_str("%s", NULL, NULL, *argv++);
> - if (nonl > 0)
> + if (likely((nonl + !*argv) > 1))
>   break;
>  
>   c = *argv ? ' ' : '\n';

Unlike Harald van Dijk's patch, the above patch breaks \c. Per POSIX
(XSI option), \c shall cause all characters following it in the
arguments to be ignored (so not only in the argument where \c occurs).
For example:
  echo 'a\cb' c; echo d
shall write "ad" followed by a newline.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: declaration utilities (was: [bug-diffutils] bug#24116: [platform-testers] new snapshot available: diffutils-3.3.50-0353)

2016-08-23 Thread Jilles Tjoelker
[Cc trimmed]
On Fri, Aug 05, 2016 at 07:15:02PM +0200, Harald van Dijk wrote:
> On 05/08/2016 18:21, Eric Blake wrote:
> > On 08/05/2016 07:13 AM, Harald van Dijk wrote:
> >> Unfortunately, POSIX currently requires the export command to not have
> >> any magic quoting, and any POSIX-conforming shell will make

> >> a="b c=d"
> >> export a=$a

> >> set a to b, and c to d. Not so with bash, but that's because bash simply
> >> isn't POSIX-conforming, even if invoked as sh.

> > Not quite so fast.

> >> POSIX will require special quoting rules for the export command in the
> >> future, similar to what bash does today. When it does, dash is likely to
> >> change to match that, and the local command will likely be changed to
> >> work the same way.

> > POSIX has already issued an interpretation request, and therefore, any
> > CURRENT implementations (including bash) that already behave as
> > permitted by the interpretation ARE conforming. [...]

> > http://austingroupbugs.net/view.php?id=351

> An interpretation request doesn't automatically mean that all current 
> implementations are conforming, does it? It only means that when the 
> interpretation response says so. In this case, the response says that 
> the standard does not allow bash/ksh's behaviour.

> http://austingroupbugs.net/view.php?id=351#c865:

> > Interpretation response
> > 
> > The standard states that the current behavior of ksh93 does not
> > conform, and conforming implementations must conform to this.
> > However, concerns have been raised about this which are being
> > referred to the sponsor.

> The fact that this is seen as a defect in the standard which will be 
> fixed, rather than a defect in bash/ksh, doesn't make bash/ksh conform, 
> it merely means that bash/ksh shouldn't be changed to conform.

Although it is certainly true that the current standard forbids special
quoting rules for export and readonly, making this change sooner rather
than later will be helpful to users.

Looking at ChangeLog.O, I can see dash had special quoting rules for
export, readonly and local between 1997 and 2001. They were removed,
basically, because there was no specification guidance. The
specification guidance is now available in the form of Austin group bug
#351.

Users of FreeBSD sh have likewise liked having these new rules
available.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] histedit: Remove non-glibc fallback code

2016-08-08 Thread Jilles Tjoelker
On Thu, Aug 04, 2016 at 05:59:08PM +0200, Jilles Tjoelker wrote:
> >From 23da600dcff616662a93f307420d9142598e2cae Mon Sep 17 00:00:00 2001
> From: Jilles Tjoelker 
> Date: Thu, 4 Aug 2016 17:51:12 +0200
> Subject: [PATCH 1/2] [HISTEDIT] Stop depending on getopt reset feature.

> Instead, use our own nextopt() function.
> ---
>  src/histedit.c | 7 +--
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/src/histedit.c b/src/histedit.c
> index 94465d7..ec45065 100644
> --- a/src/histedit.c
> +++ b/src/histedit.c
> @@ -214,13 +214,8 @@ histcmd(int argc, char **argv)
>   if (argc == 1)
>   sh_error("missing history argument");
>  
> -#ifdef __GLIBC__
> - optind = 0;
> -#else
> - optreset = 1; optind = 1; /* initialize getopt */
> -#endif
>   while (not_fcnumber(argv[optind]) &&
> -   (ch = getopt(argc, argv, ":e:lnrs")) != -1)
> +   (ch = nextopt(":e:lnrs")) != '\0')
>   switch ((char)ch) {
>   case 'e':
>   editor = optionarg;

This is clearly wrong; not_fcnumber() should be passed *argptr instead
of something bogus depending on optind.

The fixed version is what FreeBSD sh has as of SVN r240541 but I have
not tested it in dash.

In any case, a side effect of this change is a small code size
reduction.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] histedit: Remove non-glibc fallback code

2016-08-04 Thread Jilles Tjoelker
On Thu, Aug 04, 2016 at 01:54:11AM -0400, Kylie McClain wrote:
> From: Kylie McClain 

> This fallback code seems to go back the import of the repository back
> in 2005, it fails on musl libc, and there aren't any comments
> explaining why this difference is needed. Regardless, any
> compatibility ifdefs should probably be checking macros defined on
> systems that need a different code path, rather than just having
> fallback code for non-glibc.

Setting optreset is required on FreeBSD and NetBSD which do not
recognize setting optind to 0. POSIX specifies neither of these methods
to reset getopt().

However, removing the dependency on this feature is simple. The shell
already features a function nextopt() that can be used to parse options
and is automatically set up for builtins.

>From 23da600dcff616662a93f307420d9142598e2cae Mon Sep 17 00:00:00 2001
From: Jilles Tjoelker 
Date: Thu, 4 Aug 2016 17:51:12 +0200
Subject: [PATCH 1/2] [HISTEDIT] Stop depending on getopt reset feature.

Instead, use our own nextopt() function.
---
 src/histedit.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/src/histedit.c b/src/histedit.c
index 94465d7..ec45065 100644
--- a/src/histedit.c
+++ b/src/histedit.c
@@ -214,13 +214,8 @@ histcmd(int argc, char **argv)
if (argc == 1)
sh_error("missing history argument");
 
-#ifdef __GLIBC__
-   optind = 0;
-#else
-   optreset = 1; optind = 1; /* initialize getopt */
-#endif
while (not_fcnumber(argv[optind]) &&
- (ch = getopt(argc, argv, ":e:lnrs")) != -1)
+ (ch = nextopt(":e:lnrs")) != '\0')
switch ((char)ch) {
    case 'e':
editor = optionarg;
-- 
2.9.2

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Declaring local variables inside while loop leads to memory leak

2016-07-26 Thread Jilles Tjoelker
On Tue, Jul 26, 2016 at 01:35:53PM +0300, Vladimir Didenko wrote:
> I found that declaring local variables inside while loop leads to
> memory leak. Code sample:

> test()
> {
>while [ true ]; do
>   local a=
>done
> }

> test

This can indeed consume a lot of memory. The memory is freed when the
function returns.

This could be fixed by adding a check before making a variable local but
that might make functions with many distinct locals slower.

Also note that the problem does not occur if local is used at the top of
a function only, as recommended by the man page.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: heredoc and subshell

2016-02-23 Thread Jilles Tjoelker
On Wed, Feb 24, 2016 at 01:07:44AM +0300, Oleg Bulatov wrote:
> trying to minimize a shell code I found an unobvious moment with
> heredocs and subshells.

> Is it specified by POSIX how next code should be parsed? dash output
> for this code differs from bash and zsh.

> --- code
> prefix() { sed -e "s/^/$1:/"; }
> DASH_CODE() { :; }
> 
> prefix A < echo line 1
> XXX
> echo line 2)" && prefix DASH_CODE < echo line 3
> XXX
> echo line 4)"
> echo line 5
> DASH_CODE

> --- bash 4.3.42 output:
> A:echo line 3
> B:echo line 1
> line 2
> DASH_CODE:echo line 4)"
> DASH_CODE:echo line 5

> --- dash 0.5.8 output:
> A:echo line 1
> B:echo line 2)" && prefix DASH_CODE < B:echo line 3
> line 4
> line 5

I think POSIX is clear that the bash/zsh behaviour is correct and the
dash behaviour is wrong. In XCU 2.6.3 Command Substitution, it says:

] With the $(command) form, all characters following the open
] parenthesis to the matching closing parenthesis constitute the
] command.

Therefore, the shell should not start reading the here-document
belonging to  prefix A 

Re: dash: read does not ignore trailing spaces

2015-12-02 Thread Jilles Tjoelker
On Wed, Dec 02, 2015 at 11:37:17PM +0100, Gioele Barabucci wrote:
> I am forwarding a bug [1] reported by a Debian user: `read` does not
> ignore trailing spaces. The current version of dash is affected by
> this bug.

> A simple test from the original reporter:

> $ dash -c 'echo "  a b  " | { read v ; echo "<$v>" ; }'
> 

> $ bash -c 'echo "  a b  " | { read v ; echo "<$v>" ; }'
> 

> Other shells like posh and mksh behave like bash.

> This error is reproducible with dash 0.5.7 and with the current master
> git master branch, commit 2e5842258bd5b252ffdaa630db09c9a19a9717ca.

> [1] https://bugs.debian.org/794965

This is a valid bug. Note that it only occurs when there are more fields
than variables. For example,
  dash -c 'echo "  a  " | { read v ; echo "<$v>" ; }'
correctly prints .

Since dash has its own code for read's splitting, it is not possible to
take a fix from NetBSD or FreeBSD sh, other than by replacing the
splitting code completely with their version.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: shell: dash - large file support

2015-07-26 Thread Jilles Tjoelker
On Fri, Jul 24, 2015 at 02:50:48PM -0400, Aleksandar Ristovski wrote:
> For builds that build 32-bit dash, configure misses to setup large
> file support resulting in issues with large files.

> For example:
> ...dp/dash-0.5.8/build$ ls -l /tmp/largefile.sh
> -rw-rw-r-- 1 aristovski aristovski 3794653588 Jul 24 14:12 /tmp/largefile.sh
> ...dp/dash-0.5.8/build$ ./src/dash /tmp/largefile.sh
> ./src/dash: 0: Can't open /tmp/largefile.sh

> Where dash was configured (on a 64-bit platform) like so:

> ...dp/dash-0.5.8/build$ CFLAGS=-m32 ../configure && make -j8

> Now, if I make the change in configure.ac and reconfigure the project,
> I get proper operation:

> ...0.5.8.patched/build$ ./src/dash /tmp/largefile.sh
> Running...
> ...

> (the actual working of the 'largefile.sh' is removed for brevity)

> Patch:

> --- dash-0.5.8/configure.ac 2014-09-28 04:19:32.0 -0400
> +++ dash-0.5.8.patched/configure.ac 2015-07-24 14:41:05.855055430 -0400
> @@ -4,6 +4,9 @@ AC_CONFIG_SRCDIR([src/main.c])
> 
>  AC_CONFIG_HEADERS(config.h)
> 
> +dnl On 32-bit builds, check for large file support
> +AC_SYS_LARGEFILE
> +
>  dnl Checks for programs.
>  AC_PROG_CC
>  AC_GNU_SOURCE

If I were the maintainer, I would use AC_SYS_LARGEFILE and remove all
complexity related to using largefile interfaces in some situations and
non-largefile interfaces in other situations.

However, the maintainer seems to prefer using largefile interfaces only
when needed, apparently to reduce the size of the binary. In the case of
large shell scripts, this is POSIXly correct (see XCU 1.5 Considerations
for Utilities in Support of Files of Arbitrary Size).

However, using a non-largefile stat() or lstat() is always wrong, since
the inode number may not fit in a non-largefile ino_t. This happens in
various cases: test -ef/-nt/-ot, finding a dot script (. special
builtin) and some related to the current directory.

On another note, test -nt/-ot should really compare the nanoseconds
parts as well.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ignored alias inside for loop

2015-07-26 Thread Jilles Tjoelker
On Sun, Jul 26, 2015 at 04:09:12PM +, Rusty Bird wrote:
> > Aliases are expanded during parsing, not during execution (like 
> > functions are). The for loop is parsed completely before it is
> > executed.

> Ah, what confused me was that I wrongly remembered it working inside
> an if (instead of for) block. But looking again now, only the alias
> *definition* was inside.

> > Using a function instead of an alias, or using eval will do what
> > you want.

> Do you mean replacing the line with: eval "alias foobarbaz='echo ok'"
> That doesn't seem to work either (in dash or bash).

eval foobarbaz

This may not be practical since it needs to be done for every alias use.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ignored alias inside for loop

2015-07-26 Thread Jilles Tjoelker
On Sun, Jul 26, 2015 at 09:23:52AM +, Rusty Bird wrote:
> Why does this ...

> #!/bin/sh
> for i in 1; do
> alias foobarbaz='echo ok'
> foobarbaz
> done

> ... result in "foobarbaz: not found", but without the for loop it works?
> Maybe I'm missing something in the spec, because bash-as-sh behaves
> the same.

Aliases are expanded during parsing, not during execution (like
functions are). The for loop is parsed completely before it is executed.

Using a function instead of an alias, or using eval will do what you
want.

Note that some parts of the parse/execute split are not specified by
POSIX. In particular, ksh93 and the real Bourne shell parse dot scripts
completely before executing anything, while most other shells parse as
needed. Example: given the file alias1.sh below:

alias foobarbaz='echo ok'
foobarbaz

The result is ok when running  dash -c '. ./alias1.sh'  but a command
not found error message when running  ksh93 -c '. ./alias1.sh'.

Similarly, a string as passed to the -c option or eval or trap builtins
may or may not be parsed fully before execution.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: getopts doesn't properly update OPTIND when called from function

2015-06-01 Thread Jilles Tjoelker
On Mon, Jun 01, 2015 at 07:30:46PM +0200, Harald van Dijk wrote:
> On 01/06/2015 08:29, Herbert Xu wrote:
> > On Fri, May 29, 2015 at 07:50:09AM +0200, Harald van Dijk wrote:

> >> But the test script in this thread does invoke getopts with
> >> parameters that are the same in all invocations, and without
> >> modifying OPTIND. I don't see anything else in the normative
> >> sections that would make the result undefined or unspecified either.
> >> I do think the script is valid, and the results in dash should match
> >> those of other shells.

> > The bash behaviour makes it impossible to call shell functions
> > that invoke getopts while in the middle of an getopts loop.

> > IMHO the dash behaviour makes a lot more sense since a function
> > always brings with it a new set of parameters.  That plus the
> > fact that this behaviour has been there since day one makes me
> > reluctant to change it since the POSIX wording is not all that
> > clear.

> True. Given that almost no shell supports that anyway, there can't be 
> too many scripts that rely on it, but I did warn about the risk of 
> breaking another type of existing scripts as well, I agree that's a real 
> concern.

FreeBSD sh inherits similar code from ash and likewise has per-function
getopts state. Various shell scripts in the FreeBSD base system use
getopts in functions without setting OPTIND=1.

> One thing that doesn't really make sense, though: if the getopts 
> internal state is local to a function, then OPTIND and OPTARG really 
> should be too. Because they aren't, nested getopts loops already don't 
> really work in a useful way in dash, because the inner loop overwrites 
> the OPTIND and OPTARG variables. While OPTARG will typically be checked 
> right at the start of the loop, before any inner loop executes, OPTIND 
> is typically used at the end of the loop, in combination with the shift 
> command. The current behaviour makes the OPTIND value in that case 
> unreliable.

First, note that the OPTARG and OPTIND shell variables are not an input
to getopts, except for an assignment OPTIND=1 (restoring an OPTIND local
at function return does not reset getopts), and that getopts writes
OPTIND no matter whether getopts's internal optind changed in this
invocation.

With that, the value of OPTIND generally used in scripts is not
unreliable. OPTIND is generally only checked after getopts returned
false (end of options), in the sequence
  while getopts ...; do
...
  done
  shift "$((OPTIND - 1))"

> So either way, I think something should change. But if you prefer to get 
> clarification first about the intended meaning of the POSIX wording, 
> that certainly seems reasonable to me.

I think the POSIX wording is clear enough, but it may not be desirable
to change getopts to work that way.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possibly wrong handling of $_?

2014-12-25 Thread Jilles Tjoelker
On Wed, Dec 24, 2014 at 12:33:32AM +0100, Vadim Zeitlin wrote:
> On Wed, 24 Dec 2014 00:21:09 +0100 Harald van Dijk  wrote:

> HvD> On 23/12/2014 23:34, Vadim Zeitlin wrote:
> HvD> >   Hello,
> HvD> >
> HvD> >   I'm not exactly sure if this is a bug because I didn't find any
> HvD> > specification about how is this supposed to behave (to my surprise it
> HvD> > turned out that $_ was not in POSIX), but please consider this:
> HvD> >
> HvD> >% zsh -c 'echo -n foo && echo $_'
> HvD> >foofoo
> HvD> >% bash -c 'echo -n foo && echo $_'
> HvD> >foofoo
> HvD> >% dash -c 'echo -n foo && echo $_'
> HvD> >foo/usr/bin/dash
> HvD> 
> HvD> This does come across as somewhat confusing, but $_ is really not a 
> HvD> special variable at all in dash.

>  Ah, this does explain it, thanks!

Dash does implement $_, but only in interactive mode.

Your selection of shells makes it appear as if dash is the odd one out,
but in fact it is not. FreeBSD /bin/sh (also an Almquist shell
derivative), mksh and ksh93 also only implement $_ in interactive mode.
Details of how $_ differ as well, particularly in ksh93. It seems unwise
to use this feature in scripts.

> HvD> If dash did something special with $_, then I agree it would be nice if 
> HvD> it would be somewhat compatible with other shells. If dash simply does 
> HvD> not implement a feature, that feature is not required by any standard, 
> HvD> and that feature is not widely used, then I suspect there won't be a lot 
> HvD> of interest in implementing that feature.

>  Yes, I understand, somehow the idea that dash didn't implement it at all
> just didn't occur to me, but clearly adding a non-standard new feature is
> not nowhere near as important as fixing [what looked like] a bug. I'm not
> sure about the "not widely part", but I don't have any non-anecdotal
> evidence one way or the other.

> HvD> Don't be put off by that, though. You are free, of course, if you feel 
> HvD> so, to attempt to convince people $_ is an important feature that all 
> HvD> shells should implement. If you have compelling use cases, if it solves 
> HvD> real problems, and if many other shells already implement it, you might 
> HvD> even get it standardised. I have never seen a need for it, but that's 
> HvD> just me speaking from personal experience, others may feel differently.

>  FWIW my initial problem started with using

>   builddir := $(shell mkdir -p some-long-make-expression && echo $_)

> in a makefile, which seemed like a nice way to assign the value to the
> variable only if the directory was successfully created. This can, of
> course, be done in a myriad of other ways, but this one just seemed like
> the most elegant to me. Whether this counts as a "need" or not is not for
> me to say.

>  Personally I'd say the main argument for adding support for "$_" to dash
> would be to avoid mysterious problems like the one I just had because dash
> silently (i.e. without giving any errors) behaves differently from the
> other shells, which in my case resulted in the makefile misbehaving only
> under Debian (where dash is used as /bin/sh) but not under systems.

Such a principle of least surprise is not a design goal for dash.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: command substitutions in $PS4

2014-11-30 Thread Jilles Tjoelker
On Mon, Nov 17, 2014 at 09:49:32PM +, Stephane Chazelas wrote:
> [tested on current git head
> f6d4def4e27b13fab174e948b94cd10550d3e10e]

> Command substitution doesn't seem to work in $PS4 (used for
> xtrace prompt):

> $ PS4='$(date +%T)> ' dash -xc :
> dash: 1: Syntax error: end of file unexpected (expecting ")")

> And with the old syntax:

> $ PS4='`date +%T`> ' dash -xc :
> > :

> And with more than one command:

> $ PS4='`date +%T`> ' dash -xc ':;:'
> > :

> dash doesn't return and seems to go in a forking loop,
> presumably because the `date` there triggers another PS4
> expansion and so on recursively

> If I prevent the recursion with:

> --- a/src/eval.c
> +++ b/src/eval.c
> @@ -776,7 +776,15 @@ evalcommand(union node *cmd, int flags)
>   int sep;
>  
>   out = &preverrout;
> +
> + /*
> +  * reset xflag temporarily for command substitutions performed
> +  * upon $PS4 expansion
> +  */
> + xflag = 0;
>   outstr(expandstr(ps4val()), out);
> + xflag = 1;
> +
>   sep = 0;
>   sep = eprintlist(out, varlist.list, sep);
>   eprintlist(out, arglist.list, sep);

> $ PS4='`date +%T`> ' ./src/dash -xc ':;sleep 1; date'
> > :
> 21:43:47> sleep 1
> 21:43:48> date
> Mon Nov 17 21:43:48 GMT 2014

> The command substitution still fails upon the first expansion
> only. I quickly gave up trying to find out why as I found the
> code there hard to follow.

In FreeBSD sh, I have disabled command substitution in PS4 (POSIX does
not require it anyway). It is treated as a syntax error.

Talking about syntax errors, dash has a problem there as well. If there
is a syntax error in PS4 and set -x is enabled, it will not execute any
simple commands (assignments to PS4 cause an error and are subsequently
reverted). FreeBSD sh prints the error messages and subsequently uses
the raw unexpanded text.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Line continuation and variables

2014-10-29 Thread Jilles Tjoelker
On Mon, Sep 29, 2014 at 10:55:07PM +0800, Herbert Xu wrote:
> On Tue, Aug 26, 2014 at 12:34:42PM +, Eric Blake wrote:
> [snip]
> > So the fact that dash is treating the elided backslash-newline as a
> > token separator, and parsing your input as if ${EDIT}OR instead of
> > ${EDITOR} is a bug in dash.

> I agree.  The following patch should fix this:

> commit ef91d3d6a4c39421fd3a391e02cd82f9f3aee4a8
> Author: Herbert Xu 
> Date:   Mon Sep 29 22:52:41 2014 +0800

> [PARSER] Handle backslash newlines properly after dollar sign
> [snip]

> diff --git a/ChangeLog b/ChangeLog
> index 0fbc514..398bd15 100644
> --- a/ChangeLog
> +++ b/ChangeLog
> @@ -1,6 +1,7 @@
>  2014-09-29  Herbert Xu 
>  
>   * Kill pgetc_macro.
> + * Handle backslash newlines properly after dollar sign.
>  
>  2014-09-28  Herbert Xu 
>  
> diff --git a/src/parser.c b/src/parser.c
> index c4eaae2..2b07437 100644
> --- a/src/parser.c
> +++ b/src/parser.c
> @@ -827,6 +827,24 @@ breakloop:
>  #undef RETURN
>  }
>  
> +static int pgetc_eatbnl(void)
> +{
> + int c;
> +
> + while ((c = pgetc()) == '\\') {
> + if (pgetc() != '\n') {
> + pungetc();
> + break;
> + }
> +
> + plinno++;
> + if (doprompt)
> + setprompt(2);
> + }
> +
> + return c;
> +}
> +
>  
>  
>  /*

This implementation of pgetc_eatbnl() does not allow pushing back a
backslash, since that would call pungetc() twice without an intervening
pgetc(). However, some places do attempt to push back a backslash. As a
result, a script file containing many repeated  ${w#\#}  will not be
parsed correctly. There is a similar bug with repeated  $\#  but this is
not specified by POSIX.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: "command -p" does not correctly limit search to a safe PATH

2014-09-27 Thread Jilles Tjoelker
On Fri, Sep 26, 2014 at 05:19:42PM +0800, Herbert Xu wrote:
> On Fri, Jul 19, 2013 at 09:49:31PM +, Harald van Dijk wrote:
> >
> > So, how about this, to be applied on top of my previous patch? It
> > defaults to using confstr() if available and reporting a hard error at
> > run time if that fails, but it can be configured to not use confstr(),
> > and/or fall back to a path specified at configuration time:

> Thanks for the patch.  But until someone who needs this complexity
> steps up, I'm going to stick with the simpler version below:

> [snip]
> diff --git a/src/var.h b/src/var.h
> index 79ee71a..872e2db 100644
> --- a/src/var.h
> +++ b/src/var.h
> @@ -107,7 +107,7 @@ extern const char defifsvar[];
>  extern const char defifs[];
>  #endif
>  extern const char defpathvar[];
> -#define defpath (defpathvar + 5)
> +#define defpath (defpathvar + 36)
>  
>  extern int lineno;
>  extern char linenovar[];

This needs a comment at the definition of defpathvar in var.c;
otherwise, someone changing the default path will subtly break command
-p without knowing. The number 36 is rather magic too, but it can be
found back through git history.

Alternatively, you could rely on the linker combining common string
constant endings: put in some #define for
"/usr/sbin:/usr/bin:/sbin:/bin" and make defpathvar a #define instead of
a const array.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug in man page

2014-03-14 Thread Jilles Tjoelker
On Sun, Mar 09, 2014 at 12:11:43PM +0100, Jeroen van Dijke wrote:
> There seems to be a bug in the dash man page, at least in 0.5.7. It
> reads:

> Precision:
> An optional period, `.', followed by an optional
> digit string giving a precision which specifies the number of digits
> to appear after the decimal point, for e and f formats, or the maximum
> number of *characters* to be printed from a string (b and s formats);
> if the digit string is missing, the precision is treated as zero;

> dash behaves cuts to the number of bytes

> $ length=10; printf "%.${length}s\n" "e"
> ee
> $ length=10; printf "%.${length}s\n" "ë”
> ë

> The  POSIX specification (2008) says:

> precision Gives the minimum number of digits to appear for the d, o,
> i, u, x, or X conversion specifiers (the field is padded with leading
> zeros), the number of digits to appear after the radix character for
> the e and f conversion specifiers, the maximum number of significant
> digits for the g conversion specifier; or the maximum number of
> *bytes* to be written from a string in the s conversion specifier. The
> precision shall take the form of a ( '.' ) followed by a decimal digit
> string; a null digit string is treated as zero.

> So it seems to me that “characters” should be changed to “bytes”.

Indeed, and the same applies to the field width. This behaviour may not
be the most useful, but it is standard and widely implemented.

Likewise, the sequences \num (in the format string) and \0num (in
arguments for %b) generate bytes, not characters.

On another note, the format string is said to be a "character string";
this may be a C'ism (meaning that it is not a wide character string).

Note that a "byte" in POSIX terminology is a "character" in C standard
terminology. I think the former is less ambiguous in general and should
be the preferred term in man pages where a unit of 8 bits is referred
to. A POSIX "character" may be more than one byte long.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: exec command and error checking

2014-01-28 Thread Jilles Tjoelker
On Tue, Jan 28, 2014 at 02:17:59PM +0100, Guido Berhoerster wrote:
> * Марк Коренберг  [2014-01-28 13:16]:
> > $ dpkg -l | fgrep dash
> > ii  dash   0.5.7-2ubuntu2
> > POSIX-compliant shell

> > $ exec 9 > dash: 1: cannot open no_such_file: No such file

> > $ exec 9 > dash: 2: cannot open no_such_file: No such file

> > So, I cannot test this operation without using $?

> No, exec is a special built in and POSIX specifies that
> ...if a redirection error occurs (see Consequences of Shell
> Errors), the shell shall exit with a value in the range 1-125

> dash correctly exits with exit status of 2 as it should. ksh93,
> mksh, and pdksh do the same.

Indeed, this is correct.

You can avoid the exit by prepending 'command':
$ command exec 9 > in BASH this works as expected (even in sh mode)

> That's either a bug or an intended deviation from the POSIX
> standard, you'll have to ask on the bug-bash list about that.

The inconsistency appears to be in the behaviour on fatal errors in
interactive shells. Strictly speaking, POSIX seems to require that the
shell continue execution with the next command, setting $? to a non-zero
value. Historically, behaviour has instead been to exit with a non-zero
status (if in a subshell) or return to the prompt with $? set to a
non-zero value (if in the top level shell). Dash implements the latter
and I think it is more useful.

Note that the two behaviours are indistinguishable if a single simple
command is entered.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: glob pattern and redirected input file name

2013-08-30 Thread Jilles Tjoelker
On Thu, Aug 29, 2013 at 04:49:12PM -0700, Jack Bates wrote:
> What is DASH supposed to do when input is redirected from a file,
> and the file name is a glob pattern? e.g.

>tar xz < foo-*.tar.gz

> Is it supposed to expand the glob pattern, or is that not supported?

Per POSIX XCU 2.7 Redirection, pathname generation may (but need not) be
performed on the word after a redirection operator other than << or <<-
if the shell is interactive and one word would result.

Dash chooses the option that results in the smallest code: never
performing pathname generation in this case.

> The following both work, is there a better workaround?

>tar fxz foo-*.tar.gz

>tar xz < $(echo foo-*.tar.gz)

These are both concise methods. The former's problem is that it does
something strange if more than one file matches. The second has problems
with pathnames starting with '-', containing backslashes or ending with
newlines.

In a script you might do
set -- foo-*.tar.gz
if [ "$#" -ne 1 ] || [ ! -f "$1" ]; then
    echo "Bad wildcard"
exit 2
fi
tar -xzf "$1"

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: bug? Spawned childs always remain in zombie state

2013-08-23 Thread Jilles Tjoelker
On Fri, Aug 23, 2013 at 08:44:44PM +1000, Herbert Xu wrote:
> On Sun, Jan 20, 2013 at 12:04:04PM +, Sjon Hortensius wrote:
> > Hi. I'm trying to create a script which monitors a directory using
> > inotify and spawns a background process for all events. However I
> > found that all childs will remain in zombie state until the script
> > quits and I am unable to find a proper fix.

> > A minimal testcase:

> > #!/bin/dash
> > while true
> > do
> > sleep 1 &
> > #jobs >/dev/null
> > done

> > If you open a second terminal you'll see that all the 'sleep'
> > processes end up being defunct. I have tried playing with `set -ma`
> > but the only workaround I found is the commented 'jobs' line.
> > Uncommenting that line will result in expected behavior where childs
> > are properly reaped. Is this a bug, or is there an alternative
> > solution I'm missing?

> You need to wait on them as otherwise dash has to keep them around
> in case you call wait(1) later on.

The shell need not remember the processes indefinitely. Per POSIX (XCU
2.9.3.1 Asynchronous Lists), the application must reference $! before
starting another asynchronous list if it wants to use the wait builtin
for that particular process later on. (Note that this means that jobs -p
is not good enough even when the job consists of a single process, and
that printing $! can add memory leaks to a script.)

POSIX also restricts an operandless wait builtin to "known process IDs".
This seems inappropriate: there are many scripts that start multiple
asynchronous lists without referencing $! and expect operandless wait to
wait for all of them. Therefore, all jobs must be remembered while they
are running. However, if $! was not referenced for them and they are not
the most recent asynchronous list, they can be discarded when they
terminate. I implemented this in FreeBSD sh and it appears to work well.

An additional issue occurs when multiple asynchronous lists are started
without a foreground process, here-document that requires a fork,
operandless jobs builtin or wait builtin in between. In that case, dash
never calls wait3() and zombies accumulate. Although the example could
be considered a fork bomb and relies on the child processes getting CPU
time often enough, this may legitimately happen if the loop contains a
read builtin. In FreeBSD sh, I added a check for zombies before forking
the first process of a background job. Some other shells call waitpid()
or similar from a SIGCHLD handler; this reaps zombies faster at the cost
of more complex code (signal handler performing non-trivial work).

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] implement privmode support in dash

2013-08-22 Thread Jilles Tjoelker
On Thu, Aug 22, 2013 at 09:59:36PM +0200, Harald van Dijk wrote:
> On 22/08/13 19:59, Tavis Ormandy wrote:
> > Hello, this is a patch to add privmode support to dash. privmode attempts to
> > drop privileges by default if the effective uid does not match the uid. This
> > can be disabled with -p, or -o nopriv.

> Your approach definitely has my support (FWTW), but there are two
> aspects that surprised me, and are different from bash and FreeBSD's sh:

> You named the option nopriv, while bash and FBSD use the name
> privileged. I think it is likely to confuse people if "bash -o
> privileged" and "dash -o nopriv" do the same thing, and that it would be
> better to match bash and give the option a positive name, such as
> "priv", or perhaps even match them exactly and use "privileged".

I think there is no reason to deviate from other shells here. Therefore,
please call it "privileged".

> In bash and FBSD, after starting with -p, set +p can be used to drop
> privileges. With your patch, dash accepts set +p, but silently ignores it.

> How does something like the attached, to be applied on top of your
> patch, look?

> [snip]
> + if (!on && (uid != geteuid() || gid != getegid())) {
> + setuid(uid);
> + setgid(gid);
> + /* PS1 might need to be changed accordingly. */
> + choose_ps1();
> + }
> +}

This code tries to use setuid() and setgid() to drop all privilege,
which is only correct if the privilege to be dropped is UID 0, or on BSD
systems. It would be better to use setresuid() or setreuid(), and change
the GID before changing the UID.

Apart from that, it is better to check the return value from setuid()
and similar functions. In particular, some versions of Linux may fail
setuid() for [EAGAIN], leaving the process running with the same
privileges.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sleep builtin

2012-11-02 Thread Jilles Tjoelker
On Thu, Nov 01, 2012 at 07:48:47PM -0600, Raphael Geissert wrote:
> A while ago I wrote a sleep builtin mainly to test its impact in the boot 
> process. The non-scientific results were not very impressive: around 1 
> second.

Anything statistically significant can be impressive, taking into
account how many other things also happen and are unaffected by the
change.

If it is not statistically significant it can just be noise.

> So, I was cleaning up some directories and found the builtin. Instead
> of just nuking it, I'm forwarding it in case anyone wants to play with
> it or even merge it. If merged it will annoy those who are used to
> GNU's sleep(1) which supports the s/m/h/d suffixes, but not those
> using fractions as I added support for them.

> Back then, I also payed attention to the resulting size of the binary,
> which obviously increased by a few KBs. However, I do remember that
> when switching from gcc 4.4 to 4.6 the resulting binary with the sleep
> builtin was smaller than the binary built with 4.4 and without sleep.

> (interestingly, Debian's 0.5.7-3 dash is bigger than in 0.5.5.1-7)

> [patch snipped]

POSIX says any utility may be provided as a builtin as long as this is
not detectable apart from performance (and {ARG_MAX} I guess).

Some ksh variants (ksh93 and mksh) have a sleep builtin, so it is likely
fairly safe to do this.

However, it will need better handling of [EINTR]. The included patch
simply exits with status 2 when it happens, which differs from what an
instance of /bin/sleep would do (assuming the signal is sent to parent
and child). If the signal is one that is ignored by default such as
SIGCHLD, /bin/sleep ignores it and the shell only takes the trap when
sleep is done. If the signal is one that causes termination by default,
/bin/sleep terminates on it and the shell returns exit status 128 plus
the signal number after which it takes the trap. Unfortunately,
determining the default action of a non-standard signal is annoying in a
portable program.

It is probably acceptable that an interactive sleep command cannot be
^Z'ed.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXPAND] Nested parameter expansion results in an empty string when quoted

2012-08-28 Thread Jilles Tjoelker
On Tue, Aug 28, 2012 at 04:27:24PM +0300, Todor Vlaev wrote:
> While playing around with parameter expansion I noticed that the
> following didn't work in dash  (dash 0.5.5.1-7.4ubuntu1) as compared
> to bash even though I believe it should be POSIX-compliant:

> my_str=swan; last_char="${my_str#${my_str%?}}"; echo ${last_char}

> If the double quotes are removed, the last character is printed
> correctly.

The double-quotes here are unnecessary and should be left out for
optimal compatibility. Note that this may require a temporary variable
if the expansion is in a context where word splitting is performed.
However, you are right that the double-quotes are permitted per POSIX.

On the other hand, some double-quotes that are really necessary are
missing:

my_str=swan; last_char=${my_str#"${my_str%?}"}; echo "${last_char}"

The inner double-quotes force any wildcard characters in $my_str to be
taken literally. This is required to work correctly with the string
sw*n, for example.

It also happens to work around dash's bugs, both the above version and a
version with the redundant quotes you like:

my_str=swan; last_char="${my_str#"${my_str%?}"}"; echo "${last_char}"

> At a quick glance through the commits after the 0.5.5.1 release I saw
> the following bug fix. Could it be related?

> 0d7d66039b614b642c775432fd64aa8c11f9a64d
> [EXPAND] Fix quoted pattern patch breakage

It is still broken with master (46abc8c6d8a5e9a5712bdc1312c0b6960eec65a4
[BUILTIN] Add support for ulimit -r 2012-07-03) here.

Fixing this requires a test suite. Without a test suite, any attempt to
fix something is likely to break something else. This is not only
because the expansions are inherently complicated but also because
dash's code for them is hard to understand.

The code for expansions in FreeBSD's sh, a related Almquist shell
variant heavily modified by me, is more correct and slightly easier to
understand. There is also a test suite. However, the dash maintainer
does not agree with some of the choices I have made where POSIX (as
modified by interpretations) is silent.

The mksh, a shell from a different lineage, also has a good set of test
cases.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [BUILTIN] Allow SIG* signal names.

2012-07-03 Thread Jilles Tjoelker
On Mon, Jul 02, 2012 at 08:22:20AM -0600, Eric Blake wrote:
> On 07/02/2012 08:11 AM, Paul Gilmartin wrote:
> > On Jul 2, 2012, at 07:51, Eric Blake wrote:

> >> ... non-required bloat ...

> > The key phrase.  And one value of a shell lacking such
> > extensions is that it provides an excellent test bed for
> > code intended to be portable within the POSIX spec.

> That argues that we should drop our strcasecmp() for the much simpler
> strcmp(), in order to remove the bloat we already have.

Such a change would be at odds with the goal of dash to be a usable
shell. It is not good to break existing scripts, even if those scripts
are not strictly POSIX-compliant, and particularly so if the scripts
relied on designed behaviour. Such changes can waste a lot of time.

Leaving the behaviour unchanged (keeping case-insensitivity but not
adding support for the SIG prefix) would strike a balance between
usability and avoiding bloat.

Moreover, dash has many other extensions to POSIX, such as:

* A particular behaviour for "export" without parameters.

* The "local" builtin.

* The XSI "type" builtin (depending on whether you want to allow XSI or
  not).

* XSI-style signal numbers in the "trap" special builtin, and more of
  them than required.

* The XSI "ulimit" builtin, with many more options than the required -f.

* Function definitions consisting of a simple command, like
f() echo hi

* A particular (probably useful) behaviour for double-quotes within
  backticks within double-quotes, like
echo "`echo "hi there"`"

* A particular behaviour for double-quotes within Bourne-style parameter
  substitutions within double-quotes, like
echo "${IFS+"*"}"

* Arithmetic expansion with a larger range than signed long (even though
  POSIX "encourages" this in non-normative text, it is not required).

Although dash is fairly minimalistic and may be useful in testing
scripts (also because it detects certain syntax errors sooner), its
usability goal makes that it has many non-mandatory features. You may
like posh which attempts to comply to Debian policy only, even at the
cost of usability.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Set SA_RESTART flag on SIGCHLD handler

2012-06-22 Thread Jilles Tjoelker
On Thu, Jun 21, 2012 at 07:36:28PM -0500, Jonathan Nieder wrote:
> Anders Kaseorg wrote:
> > On Sun, 18 Mar 2012, Jonathan Nieder wrote:

> >> --- i/src/trap.c
> >> +++ w/src/trap.c
> >> @@ -259,7 +259,7 @@ setsignal(int signo)
> >>act.sa_handler = SIG_DFL;
> >>}
> >>*t = action;
> >> -  act.sa_flags = 0;
> >> +  act.sa_flags = SA_RESTART;
> >>sigfillset(&act.sa_mask);
> >>sigaction(signo, &act, 0);
> >>  }
> [...]
> > It looks like this was never forwarded to the dash mailing list.  Any 
> > thoughts from upstream?

> > Context: http://thread.gmane.org/gmane.comp.version-control.git.debian/147
> > (FTR, it did happen again.)

> Thanks.  Here's a less invasive patch to experiment with.

> POSIX (XCU §2.11 "Signals and Error Handling") requires that the "wait"
> utility must return immediately when interrupted by a trapped signal,
> so SA_RESTART is not appropriate for those.

Same thing for "read", even though that does not make much sense (the
only thing the trap handler can do is abandon the read, since bytes
already read but not part of a complete line may be lost).

On the other hand, this interruption is not specified to happen with
signals whose default action is to ignore. Many shells don't do this,
though (perhaps because of the difficulty of finding what the default
action is).

> It does not seem to give any advice on whether signals interrupt
> redirection.

For certain situations it certainly does. For simple commands without
command name, it says any redirections shall be performed in a subshell
environment; hence, trapped signals are reset to the default action.
This is what most shells appear to do in all cases. Again, this may not
be the most useful behaviour but it matches what the Bourne shell did.

> diff --git i/src/redir.c w/src/redir.c
> index f96a76bc..7fb7748f 100644
> --- i/src/redir.c
> +++ w/src/redir.c
> @@ -206,8 +206,11 @@ openredirect(union node *redir)
>   /* FALLTHROUGH */
>   case NCLOBBER:
>   fname = redir->nfile.expfname;
> - if ((f = open64(fname, O_WRONLY|O_CREAT|O_TRUNC, 0666)) < 0)
> + while ((f = open64(fname, O_WRONLY|O_CREAT|O_TRUNC, 0666)) < 0) 
> {
> + if (errno == EINTR)
> + continue;
>   goto ecreate;
> + }
>   break;
>   case NAPPEND:
>   fname = redir->nfile.expfname;

It looks like this patch breaks interrupting (via SIGINT) a blocking
open, for example of a fifo. Although the race condition associated with
relying on [EINTR] is very hard to close, I don't think that's reason to
give up on it entirely.

I think it is better to rethink commit
3800d4934391b144fd261a7957aea72ced7d47ea that caused this bug (and some
others that have been fixed). A signal handler for SIGCHLD was added to
allow waiting for a trapped signal or a child process at the same time
(using sigsuspend, as was done, or sigwait). However, leaving the signal
handler installed all the time causes unexpected [EINTR] errors, so I
suggest only installing it during the "wait" builtin or specifying
SA_RESTART only for this special SIGCHLD handler (while leaving all
other signal handlers interrupting). In either case, behaviour for
SIGCHLD handlers installed because of CHLD traps should remain
unchanged. The former seems safest because SA_RESTART handlers still
cause visible effects such as short writes.

In case of a script doing 'trap "" CHLD', it is perhaps undesirable for
"wait" to block forever, given that wait(2) doesn't do that either.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: wait and ctrl+Z

2012-05-03 Thread Jilles Tjoelker
On Fri, May 04, 2012 at 12:28:17AM +0200, Marc Glisse wrote:
> On Fri, 4 May 2012, Herbert Xu wrote:
> > Marc Glisse  wrote:
> >> Hello,

> >> I noticed a strange behavior of "wait" when I suspend and resume a script.

> >> $ cat a.sh
> >> #!/bin/dash
> >> (sleep 7; echo blah) &
> >> (sleep 7; echo bloh) &
> >> wait ; echo coucou
> >> $ ./a.sh
> >> ^Z
> >> zsh: suspended  ./a.sh
> >> $ fg
> >> [1]  + continued  ./a.sh
> >> coucou
> >> $ blah
> >> bloh

> >> As you can see, the instruction after "wait" was executed immediatly on
> >> resume, without waiting for the jobs.

> >> If I replace the ';' after "wait" by "&&" and do the same suspend+resume,
> >> "coucou" is never printed.

> >> I am using dash version 0.5.7-3 in debian testing.

> > That's normal as wait was interrupted by a signal.  If you want
> > to wait even after an interruption, you should check the return
> > value of wait.

> Hello, and thanks for you answer.

> I find that quite surprising. I re-read the posix description of wait, and 
> my understanding is that the return value of wait should depend on what 
> happened to the waited process (exit code, signal), not to wait itself. 
> And other shells seem to agree.

This is not actually said in the XCU 'wait' page but in XCU 2.11 Signals
and Error Handling.

However, it says something subtly different: only a signal for which a
trap has been set should cause 'wait' to return immediately with an exit
status greater than 128.

Because no trap has been set on SIGTSTP, 'wait' should not be
interrupted here and the shell should continue waiting.

Likewise, if the shell internally uses SIGCHLD to get notified about
process termination, this does not interrupt 'wait'; dash implements
that aspect properly.

> Are you suggesting that wait should always be used in a loop? With what 
> check exactly?

Only if you have set any traps that resume execution of the original
script (i.e. do not exit the process).

Otherwise, if 'wait' is being called without parameters, you can do
something like
  until wait; do :; done

If 'wait' is being called with parameters, the required loop is very
complicated.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] dash doesn't report syntax error when it should on stray "fi"

2012-04-24 Thread Jilles Tjoelker
On Mon, Apr 23, 2012 at 02:54:58PM -0500, Jonathan Nieder wrote:
> Chet Ramey wrote:

> > The way I read 2.10.2, Posix requires that the "fi" resolve to the
> > reserved word `fi' in this case, since it's in a position where a
> > command name is expected (Rule 1, [Command Name]).  I would think that
> > would make it a syntax error in any application, conforming or not.

> Yes, I think you're right.

> For some reason I thought that using reserved words in ways not
> permitted by the grammar produced undefined behavior, to support
> shells with extensions like the arithmetic "for".  But now that I read
> more closely, it looks like "function", "[[", "]]", and "select"
> produce unspecified behavior but everything else is pretty well
> specified.  Thanks for clarifying.

In what way does a lone "fi" differ from constructions like the below,
which do not match the POSIX grammar but are accepted by some subset of
shells that attempt to comply to POSIX:

case x in (esac) ;; esac

for i; do echo "$i"; done

for i;
do echo "$i"; done

for i { echo "$i"; }

for ((i = 0; i < 10; i += 1)) do echo "$i"; done

f() :

The only difference seems to be that the lone "fi" is somehow "wrong"
while the above are not "wrong". That's too vague for a specification.

Unfortunately, the XCU volume is very thin on constraints (as found, for
example, in the C standard). Many ways to do things "wrong" are instead
left open for implementation extensions. In practice, many such
extensions exist.

That said, rejecting the lone "fi" is still a quality of implementation
issue, particularly because dash's current behaviour is poorly defined.
I fixed it in FreeBSD a while ago (9.0 has the fix).

In making this change, a testsuite is very useful to guard against bugs
and to ensure no cases are missed, but latent bugs in shell scripts may
still cause annoyance. If there is an error in a command substitution
that is normally not executed, most shells (except ash derivatives) will
not notice this.

For example, FreeBSD 9.0's sh is one of the few shells that detects the
error in:

if false; then `fi`; fi

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: local var=$(cat) only reads one line

2012-04-09 Thread Jilles Tjoelker
On Sun, Apr 08, 2012 at 06:09:06PM -0400, Jim Pryor wrote:
> Hi, I'm not a subscriber to this list and am not sure where's the best
> place to report this bug. A cursory search didn't find other reports
> of the same issue; but I can imagine there may well be such. If so,
> excuse the noise. I'm just trying to help.

> I notice the following issue in the version of dash that's bundled in
> a recent release of finnix (not sure which one, but the kernel version
> is 3.0.6, so it's probably finnix 103, released 23 October 2011). I
> also see it in the FreeBSD sh, from a FreeBSD-9 release candidate I
> compiled in January. I know that's not dash, but I understand the
> codebases are closely related. Neither of these are my active systems;
> hence the fuzzy details.

> Here is a testcase:

> test1() {
> local IN=$(cat)
> printf "test1 <%s>\n" "$IN"
> }

> MSG=$(printf "abc\ndef\nghi")

> printf "%s" "$MSG" | test1

> The weird bit only shows up in test1: IN will only be assigned the first
> line of stdin.

The cause is that the local command expands to
  local IN=abc def ghi
and therefore it sets IN to less than what you expect and also makes two
more variables local.

Now, local is not in POSIX, but export and readonly are and have exactly
the same issue.

Reading POSIX.1-2008 (and also older versions) literally, this way of
splitting is correct. However, ksh93, bash and pdksh think differently,
doing what your script expects. Also, the examples section of export has
  export PATH=/local/bin:$PATH
which assumes that either the value will not contain IFS characters or
metacharacters or it will not be split regardless.

Dash had this behaviour for some time but it was dropped again because
of inconsistencies between implementations.

However, there appears to be some sort of agreement on how it should
work now:
http://austingroupbugs.net/view.php?id=351
(scroll down to note 943).

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] var.c: check for valid variable name before printing in "export -p"

2012-02-25 Thread Jilles Tjoelker
On Sun, Feb 26, 2012 at 01:31:55AM +1100, Herbert Xu wrote:
> On Sat, Feb 25, 2012 at 03:30:04PM +0100, Jilles Tjoelker wrote:

> > Most shells pass the environment variable through, such as bash,
> > zsh, ksh93 and most ash derivatives. However, the original Bourne
> > shell and pdksh/mksh do not.

> Do you know of any genuine uses of such environment variables?

No, but I know that I do not know. I recommend noting the (small)
potential for breakage in the changelog. In a large body of software,
subtle features are often depended on somewhere, be it inadvertently.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] var.c: check for valid variable name before printing in "export -p"

2012-02-25 Thread Jilles Tjoelker
On Sat, Feb 25, 2012 at 06:36:36PM +1100, Herbert Xu wrote:
> On Tue, Feb 14, 2012 at 10:48:48AM +, har...@redhat.com wrote:
> > From: Harald Hoyer 

> > "export -p" prints all environment variables, without checking if the
> > environment variable is a valid dash variable name.

> > IMHO, the only valid usecase for "export -p" is to eval the output.

> Thanks a lot for the report and patch.

> I'd prefer to fix this up at entry into the shell rather than
> when we print out the variables.  So how about this patch?

Such a change would change other things than just "set" and "export -p".
It would also not propagate environment variables with invalid names to
child processes. For example:

env -i not-a-name=1 PATH="$PATH" dash -c env | grep not-a-name

Most shells pass the environment variable through, such as bash, zsh,
ksh93 and most ash derivatives. However, the original Bourne shell and
pdksh/mksh do not.

I think it is best to pass them through so that executing a utility with
and without the shell are as similar as possible (most versions of
make(1) assume this is the case by skipping the shell if the command is
simple enough) and particularly for dash for historical/compatibility
reasons.

I did something similar to the OP's patch in FreeBSD (SVN r223183):
"set" and "export -p" skip variables with invalid names.

Note that this problem cannot occur with "readonly -p" because only
variables with a proper name can be marked read-only.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question about job control in non-interactive shells

2012-01-13 Thread Jilles Tjoelker
On Mon, Jan 09, 2012 at 10:37:45AM -0600, Jonathan Nieder wrote:
> Michael Welsh Duggan wrote:
> > I am trying to determine why:

> >   dash -c "sleep 5 & kill %1"

> > results in:

> >   dash: 1: kill: No such process

> You are probably looking for the -m option.

The cause is that the -m option ("job control") enables running commands
in separate process groups, and dash follows literally what POSIX says
about kill %job: a background process group should be signaled; however,
there is no background process group. Some shells signal one or more
processes they know are part of the job in this case, but dash calls
kill() on a process group that is guaranteed not to exist.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: evaluation of env variables in DASH

2011-10-19 Thread Jilles Tjoelker
On Wed, Oct 19, 2011 at 11:24:50PM +0200, Dima Sorkin wrote:
>   The following DASH behaviour seems buggy to me

> -
> $ export A='\n'
> $ echo $A
> 
> $
> -

> whereas using BASH would result in printing literally \n .

> I.e. presumingly DASH does secondary interpolation of escaped
> symols in values of environment variables. 

The echo builtin in dash differs from most other echo utilities on Linux
and *BSD in interpreting System V-like backslash escape sequences. This
is the "expansion" you are seeing and the same thing can be seen in
  echo '\n'
It is documented in dash's manual page.

This also happens in bash if you 'shopt -s xpg_echo'.

This behaviour is permitted by POSIX and required for the XSI option,
and is more commonly seen on Solaris or other System V derivatives.

The fix is to use printf(1) instead of echo(1) if there is a possibility
the string may start with '-' or contain '\'. In this case,
  printf '%s\n' "$A"

This has been asked/reported more frequently and the answer has been
that it will not be changed.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug: temporary assignments vs shell function

2011-07-31 Thread Jilles Tjoelker
On Thu, Jul 14, 2011 at 12:09:15PM +0200, Rudolf Polzer wrote:
> I just retested with that very command, and yes, the FreeBSD /bin/sh
> does behave the "non-POSIX" (and "obvious") way, and so does the
> Solaris /bin/sh.

Yes, FreeBSD /bin/sh is not POSIX compliant here. Because the current
behaviour is explicitly documented and much more useful than the POSIX
behaviour, I don't really like changing it ;-)

Different from bash in non-posix mode, FreeBSD /bin/sh does persist
assignments before special builtins as required by POSIX.

The original Bourne shell is different again, though. It ignores the
assignments completely and does not even expand them (a
y=${someunsetvar?} assignment is silently ignored).

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'continue' does not work in files sourced with dotcmd

2011-07-10 Thread Jilles Tjoelker
On Sat, Jul 09, 2011 at 10:07:30PM +0800, Herbert Xu wrote:
> On Sat, Jul 09, 2011 at 03:07:04PM +0200, Jilles Tjoelker wrote:
> > A fix for dash is below. The dash code is broken in a different way than
> > the FreeBSD sh code was, but the patched code is pretty much the same.
> > This makes the above test work and does not change the outcome of any
> > other tests in the FreeBSD sh testsuite.

> You're right.  But I think your patch may introduce a problem
> with a return statement inside a dot script.  That should not
> have an effect after exiting the script.

Interesting. Dash has been returning from the closest scope (function or
sourced script) for a while, but the SKIPFUNC/SKIPFILE distinction and
the comment in eval.c returncmd()

]   /*
]* If called outside a function, do what ksh does;
]* skip the rest of the file.
]*/

still gave me the impression that it behaved like older ash (also in
FreeBSD and NetBSD), trying to be bug-compatible with the Bourne shell
by having 'return' return from a function, if any, and only return from
a dot script if there is no function (because the Bourne shell gives an
error in that case).

It may be better to name the constant SKIPRETURN rather than SKIPFUNC.

> Anyway, the following patch based on your idea should fix the
> problem.

> commit 4f7e206782675b548565ca2bc82bc8c262a0f20e
> Author: Herbert Xu 
> Date:   Sat Jul 9 22:05:22 2011 +0800
> 
> [BUILTIN] Merge SKIPFUNC/SKIPFILE and only clear SKIPFUNC when leaving 
> dotcmd

Yes, this works too.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'continue' does not work in files sourced with dotcmd

2011-07-09 Thread Jilles Tjoelker
On Fri, Jul 08, 2011 at 04:09:36PM +0800, Herbert Xu wrote:
> On Thu, Jul 07, 2011 at 09:18:12AM -0600, Eric Blake wrote:
> > Meanwhile, I still have to wonder about dash behavior - if dash is _not_
> > treating continue as a syntax error, then why is it aborting the dot
> > script?  Compare:

> This is a consequence of the way continue/break/return is implemented
> in dash.

> As this construct cannot be used portably anyway, I'm not inclined to
> make any changes unless you can do it without increasing the code
> size.

It is not a hard consequence, the behaviour can be fixed fairly easily.

I noticed a similar problem in FreeBSD sh and fixed it in head in August
2010, Subversion r211349.

A testcase is (note that this must be saved to a named file because it
sources itself):

# $FreeBSD: head/tools/regression/bin/sh/builtins/break1.0 211349 2010-08-15 
21:06:53Z jilles $

if [ "$1" != nested ]; then
while :; do
set -- nested
. "$0"
echo bad2
exit 2
done
exit 0
fi
# To trigger the bug, the following commands must be at the top level,
# with newlines in between.
break
echo bad1
exit 1


Given that quite a few shells pass this test, I consider it prudent to
make it work. It is not unlikely that someone somewhere has written a
script that depends on it, and treating the break or continue as a
return is rather surprising in a negative way. (Similar for break or
continue in eval or trap, though I think those work in dash.)

A fix for dash is below. The dash code is broken in a different way than
the FreeBSD sh code was, but the patched code is pretty much the same.
This makes the above test work and does not change the outcome of any
other tests in the FreeBSD sh testsuite.

diff --git a/src/main.c b/src/main.c
index af987c6..cdd91e2 100644
--- a/src/main.c
+++ b/src/main.c
@@ -242,7 +242,8 @@ cmdloop(int top)
 
skip = evalskip;
if (skip) {
-   evalskip = 0;
+   if (skip == SKIPFILE)
+   evalskip = 0;
break;
}
}

Using
evalskip &= ~SKIPFILE;
instead of
if (skip == SKIPFILE)
evalskip = 0;
should also work and might generate shorter code.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Porting dash to OpenBSD

2011-07-03 Thread Jilles Tjoelker
On Sun, Jun 26, 2011 at 06:18:59AM +0400, Mike Korbakov wrote:
> Unfortunatly, not all systems (like OpenBSD) has built-in texttools
> like nl, that makes it difficult to port dash, with minimum troubles.

> To reduce dependencies, I suggest using instead of nl other tools like
> awk or cat (as advised openbsd porters team).

> Building dash-0.5.6.1 was successfull with this patch:

> --- src/mkbuiltins.orig Sat Jun  5 13:34:23 2010
> +++ src/mkbuiltins  Sun Jun 26 02:36:23 2011
> @@ -84,7 +84,7 @@ cat <<\!
>   */
> 
>  !
> -sed 's/-[a-z]*//' $temp2 | nl -v 0 | LC_COLLATE=C sort -u -k 3,3 |
> +sed 's/-[a-z]*//' $temp2 | awk '{ print "  " FNR-1 "  " $0 }'  | 
> LC_COLLATE=C sort -u -k 3,3 |
>  tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ |
> awk '{  printf "#define %s (builtincmd + %d)\n", $3, $1}'
>  printf '\n#define NUMBUILTINS %d\n' $(wc -l < $temp2)

This looks reasonable. It works here (FreeBSD with somewhat hacked dash
source), leaving the resulting binary unchanged.

A minor nit: comparing with other awk invocations, it seems more usual
to use NR rather than FNR. They both do the same thing if there is only
one input file.

The nl utility has caused more portability problems already; the code in
master has different options. This is so even though nl is in SUSv4
under the XSI option. We already use awk so that should be safer.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JOBS: fix klibc DEBUG compilation

2011-06-03 Thread Jilles Tjoelker
On Fri, Jun 03, 2011 at 07:38:27PM +0200, maximilian attems wrote:
> dash didn't compile in DEBUG mode against klibc for all long time.
> Now it fails at link stage for not having setlinebuf(3).

> Fixes:

> usr/dash/show.o: In function `opentrace':
> show.c:(.text+0x86): undefined reference to `setlinebuf'  
>   
> Signed-off-by: maximilian attems 
> ---

> the last open error, looks more like a klibc bug to me,
> will fix it there:
> show.c:(.text+0x36): undefined reference to `freopen'

So it seems, freopen() is a perfectly valid standard C function.

>  src/show.c |2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)

> diff --git a/src/show.c b/src/show.c
> index 14dbef3..b4160e1 100644
> --- a/src/show.c
> +++ b/src/show.c
> @@ -394,7 +394,9 @@ opentrace(void)
>   if ((flags = fcntl(fileno(tracefile), F_GETFL, 0)) >= 0)
>   fcntl(fileno(tracefile), F_SETFL, flags | O_APPEND);
>  #endif
> +#ifndef SMALL
>   setlinebuf(tracefile);
> +#endif /* SMALL */
>   fputs("\nTracing started.\n", tracefile);
>  }
>  #endif /* DEBUG */

Why not just replace the non-standard setlinebuf() call with the
standard  setvbuf(tracefile, NULL, _IOLBF, 0);  ? This appears to work
just as well on FreeBSD and is C99 compliant (no POSIX needed here).

The #define SMALL is only for disabling line editing and history (using
libedit). Setting the trace file line buffered is useful regardless of
that.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC dash 0/4] Avoid a fork before running last command given to -c

2011-04-17 Thread Jilles Tjoelker
On Fri, Apr 15, 2011 at 09:07:09PM +0800, Herbert Xu wrote:
> On Sun, Apr 10, 2011 at 07:18:17AM +, Jonathan Nieder wrote:
> > Jilles Tjoelker wrote[0]:

> > > Regarding sh -c optimization, I am in favour of this. Uselessly waiting
> > > 'sh -c' processes annoy me. I made the change for FreeBSD 8.0 sh, which
> > > is very similar to dash. The SVN changeset is r194128.

> > So I grabbed that changeset with

> > svn log -v svn://svn.freebsd.org/base/head/bin/sh -r 194128
> > svn diff -r 194127:194128 svn://svn.freebsd.org/base/head/bin/sh

> > and made it a tiny bit smaller.   Here's the result.

> >textdata bss dec hex filename
> >   839941784   11128   96906   17a8a dash.before-O2
> >   839941784   11128   96906   17a8a dash.before-Os
> >   841461784   11128   97058   17b22 dash.after-O2
> >   841461784   11128   97058   17b22 dash.after-Os

> > On this amd64 the cost is 152 bytes of text.  Thoughts?

> I must say that I don't see much value in this feature.  Adding
> exec to the invocation is trivial.

It is trivial when writing command lines that are obviously going to be
passed to sh -c, but in practice it is often not done. The optimization
would be useful with system(), popen() and Makefiles; rarely does one
see an "exec" in such contexts. In a Makefile, "exec" can be actively
detrimental since it usually forces the command to be run using the
shell, preventing a direct execve() by make.

In all contexts, "exec" is detrimental if a builtin version of the
executed utility exists. If the utility is a special builtin, prepending
"exec" is very likely to cause the command to stop working, and
otherwise it adds a useless execve().

It was proposed to add text encouraging "exec" prepending to POSIX, but
this was rejected. See http://austingroupbugs.net/view.php?id=236 and
http://thread.gmane.org/gmane.comp.standards.posix.austin.general/1918 .

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: shift "fatal error"

2011-03-15 Thread Jilles Tjoelker
On Thu, Mar 10, 2011 at 08:23:33PM +0100, Guido Berhoerster wrote:
> * Dan Muresan  [2011-03-10 19:41]:
> > Hi, is there some consensus on whether shift should cause a "fatal
> > error" as reported by Herbert against bash:
> > 
> > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=252378
> > 
> > # doesn't print anything
> > dash -c 'shift 2; echo hi'
> > 
> > My copy of SUSv3 doesn't seem to imply any "fatal error" handling
> > requirement for shift:
> > 
> > --- 8-X ---
> > EXIT STATUS
> > 
> > The exit status is >0 if n>$#; otherwise, it is zero.
> > 
> > CONSEQUENCES OF ERRORS
> > 
> > Default.
> > --- 8-X ---

> For IEEE Std 1003.1-2008 see section 2.8.1 "Consequences of
> Shell Errors": a "utility syntax error (option or operand error)"
> with special built-ins shall cause the shell to exit. That's what
> dash, ksh93, and pdksh do.

That may have been the intention (considering that it matches the real
Bourne shell in addition to various flavours of Korn shell), but that is
not how I would interpret it. A too high shift count still seems
syntactically valid to me.

A statement about the exit status in a particular error situation also
usually indicates that the shell shall not abort.

Examples of shift commands that I think shall cause the shell to abort:
shift -S  # unless a -S option is supported as an extension
shift x   # unless arithmetic expressions are accepted as an extension
shift @   # unless there is some strange extension

The FreeBSD sh shift special builtin behaves this way (a too high shift
count is not fatal but a syntactically invalid number is). This was done
a few years ago when the syntax-error property of special builtins was
implemented, as making a too high shift count fatal caused a configure
script in the FreeBSD base system to fail.

I would not encourage the extension of accepting arithmetic expressions
because that requires a subtly different kind of arithmetic expression
without octal constants. POSIX is clear that 'shift 010' should shift
ten positions, not eight, while 'shift $((010))' should shift eight
positions.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [PARSER] Remove backslash before } in double-quotes in variable

2011-03-11 Thread Jilles Tjoelker
On Thu, Mar 10, 2011 at 04:59:27PM +0800, Herbert Xu wrote:
> On Sun, Nov 21, 2010 at 02:42:22PM +0100, Jilles Tjoelker wrote:
> > The backslash prevents the closing brace from terminating the
> > substitution, therefore it should be removed.
> > 
> > FreeBSD sh test expansion/plus-minus2.0 starts working, no other tests
> > are affected.
> > 
> > Example:
> >   printf "%s\n" ${$+\}} ${$+"\}"} "${$+\}}"
> > should print } three times, without backslashes.

> I agree with respect to the last one, but not the middle one.

> ${$+"\}"} should be identical in behaviour to "\}", which with
> all shells I have access to produces a brace with a backslash.

> So please update your patch so that we do not regress on the
> second one.

Meh. Now that I read http://austingroupbugs.net/view.php?id=221#c399
again, that particular bit is not so clear anymore. :(
That change only seems to affect parameter expansions inside
double-quotes, not double-quotes inside parameter expansions.

On the other hand, removing the backslash makes some sense because
${$+"}"} is unspecified (even though it expands to } in many shells). In
dash in particular it has an undesirable effect. When I wrote the code
to remove the backslash in FreeBSD sh, the first } ended the
substitution which as a result ended double-quoted although it started
unquoted; I changed this later so that the first } is literal, but did
not change the treatment of \} in that case.

My patch for dash goes further than what I did in FreeBSD in that it
also affects the four varieties of parameter expansion for substring
processing (#/##/%/%%). I think that is wrong and the FreeBSD behaviour
is correct. In the standard as modified by the aforementioned
interpretation, ${v#"}"} and ${v#'}'} seem valid (the pattern is a
literal closing brace). Therefore there seems no reason to remove the
backslash in ${v#"\}"}, even though ksh93 does it.

Distinguishing #/##/%/%% from the others at parse time is not possible
with the way dash's parser.c currently works. So a simple patch would
attempt to change \} inside "${...}" only (test dqvarnest instead of
varnest). I think that does not affect the end result for #/##/%/%% so
perhaps it can be committed as an small improvement.

At some point a larger change is probably necessary to fix things like
"${v#'a'}" which should trim the letter a, not single-quote a
single-quote. This works in FreeBSD 9 sh but I think its
Bourne/Korn-like approach to POSIX-unspecified constructs like
"${v+"what*is*this"}", which generates pathnames, just like the original
ash and older FreeBSD code, may not be appreciated in dash.

Some examples:

FreeBSD 9 sh does this:

% sh
$ v='a\}b'
$ echo ${v%"\}"*}
a
$ echo ${v+"\}"}
}

while dash with my original patch and ksh93 do this:

% ksh93
$ v='a\}b'
$ echo ${v%"\}"*}
a\
$ echo ${v+"\}"}
}

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [BUILTIN] Fix corruption of reads with byte 0x81

2011-03-10 Thread Jilles Tjoelker
On Thu, Mar 10, 2011 at 09:01:45PM +0800, Herbert Xu wrote:
> On Thu, Feb 24, 2011 at 11:43:44AM +, Alexey Gladkov wrote:
> > Starting with commit 55c46b dash removes CTLESC bytes ('\x81')
> > from read sequence. This leads to breakage of some UTF8
> > characters. Like in commit f8231a, this change fixes corruption
> > by removing the faulty code.

> Thanks for the diagnosis and patch!

> Unfortunately we can't just delete the rmescaps call since we do
> use CTLESC to represent backslash characters in the input stream
> which prevents field splitting.

> So the correct fix is to add extra CTLESCs wherever CTLESC appears
> in the input.  The following patch should fix the problem.

That is not how ifsbreakup() works. As I have written in FreeBSD sh
expand.c:

/*
 * Break the argument string into pieces based upon IFS and add the
 * strings to the argument list.  The regions of the string to be
 * searched for IFS characters have been stored by recordregion.
 * CTLESC characters are preserved but have little effect in this pass
 * other than escaping CTL* characters.  In particular, they do not escape
 * IFS characters: that should be done with the ifsregion mechanism.
 * CTLQUOTEMARK characters are used to preserve empty quoted strings.
 * This pass treats them as a regular character, making the string non-empty.
 * Later, they are removed along with the other CTL* characters.
 */

The ifsbreakup() function works the same way in dash. (One reason is
that this allows using the CTL* bytes in IFS, although it may not be
that useful because of the prevalence of UTF-8.)

So while this patch fixes corruption with byte 0x81, backslashes
continue to have no effect at all. Instead, all non-backslashed
characters should be marked with recordregion(), leaving CTLESC
prefixing for CTLESC only.

Apart from that, there is corruption with byte 0x88, CTLQUOTEMARK. I
think that can be fixed in the same way by prefixing with CTLESC.

By the way, in the data pointed to by NARG nodes, dash does use CTLESC
for backslashed characters that should not be IFS splitting points,
which is only relevant for WORD in ${VAR+WORD} and ${VAR-WORD}. A
downside of this is that quoted and unquoted CTL* bytes cannot be
distinguished; therefore I have solved this differently in FreeBSD.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: jobs in scripts?

2011-03-09 Thread Jilles Tjoelker
On Tue, Mar 08, 2011 at 09:02:30PM +0200, Dan Muresan wrote:
> Hi, is the following normal? With set -m the problem disappears; bash
> & ksh don't need set -m.

> SUSv3 doesn't say job control is to be disabled completely unless "set
> -m" is active. And, if job control *is* disabled, why does "job"
> report a non-empty list?

> 0.5.5.1-3ubuntu2 on Ubuntu 10.04 here.

> ---

> $ cat x
> #set -m
> sleep 100 &
> echo "Before kill:"; jobs -l
> kill -s TERM %1
> sleep 0.5
> echo "After kill:"; jobs -l
> pidof sleep

> $ dash x
> Before kill:
> [1] + 18775 Running
> kill: 4: No such process

> After kill:
> [1] + 18775 Running
> 18775

See the description of "set" in XCU 2.14 Special Built-In Utilities. It
is deliberate that "jobs" still works, as "set -m" only enables putting
jobs in separate process groups and various notices that are only
meaningful in interactive mode (there are no prompts in non-interactive
mode, therefore the notices about completed and stopped background jobs
do not happen, and if a foreground job stops in non-interactive mode
this tends to lead to unexpected results).

That much is clear. However, "kill" and "jobs -p" are less clear. Their
descriptions depend on the process group of a job, which does not exist
if job control was disabled when the job was started -- in that case the
job is run in the shell's process group. This could be interpreted as an
implicit requirement on the application that the job must have been
started with job control enabled if "kill" or "jobs -p" are to be used
on it.

In your case, I recommend getting the PID via $!. This allows 'kill'
without depending on obscure corners of shells.

If your script runs for a long time, use of $! should be paired with
'wait' (either for all $! values obtained or without parameters) so that
the shell can forget about old jobs. Some shells are buggy and remember
the jobs anyway.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] Port to Solaris

2011-01-21 Thread Jilles Tjoelker
On Tue, Jan 18, 2011 at 09:15:40PM -0800, Brian Koropoff wrote:
> - Older Solaris does not support %jd (intmax_t) in format
>   strings, but it does support %lld (long long), which is
>   the same size on all architectures it supports.  Do
>   a configure check for the sizes of both and prefer %lld
>   when it is safe to do so.

Pedantically, that's a regression for systems that do support %jd, and I
expect compilers to warn about it. If %jd is available, it must be used
for intmax_t, and not %lld, because that is for long long, not intmax_t.

What you can do is use PRIdMAX from , normally defined as
"jd". You can then define this to "lld" or "jd" if it is not defined.
I think this makes the code uglier (just like your change), but oh well.

> - Older Solaris lacks stdint.h, but inttypes.h provides the
>   same types and works on all platforms I've tried dash on,
>   so just use it instead.

 is defined to be a superset of , so that is ok.
It is also needed for the PRIdMAX suggestion above.

> [...]
> diff --git a/src/arith_yacc.c b/src/arith_yacc.c
> index 6c5a720..bf21830 100644
> --- a/src/arith_yacc.c
> +++ b/src/arith_yacc.c
> @@ -33,7 +33,6 @@
>   */
>  
>  #include 
> -#include 
>  #include 
>  #include "arith_yacc.h"
>  #include "expand.h"

This is useful regardless as the  is redundant. The
 is already needed here because of imaxdiv().

By the way, I wonder what the advantage of imaxdiv() above separate %
and / is. Compilers can detect the matching between a % b and a / b and
do it in one operation, and any use of imaxdiv() trips gcc's
-Waggregate-return.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: setvar MIA?

2011-01-11 Thread Jilles Tjoelker
On Tue, Jan 11, 2011 at 10:00:33AM -0700, Eric Blake wrote:
> On 01/11/2011 09:54 AM, Aragon Gouveia wrote:
> > I wasn't sure of its status in POSIX.  It is useful for declaring
> > variable variables - tidier than eval and I imagine faster, eg.

> > index="1"
> > setvar var_${index} "value"

> > Will emulate it with a local function - thanks.

> Indeed, it looks like FreeBSD introduced it as shorthand for:

> setvar() { eval $1=\$2; }

> The speed difference between that function doing an eval and a shell
> builtin would be in the noise.  I don't know why FreeBSD even bothered
> to pollute the namespace with a builtin like that.

The setvar builtin was already present and documented in the initial
ash. FreeBSD simply inherited it. Dash inherited it too, but it was
removed in dash-0.4.14, 3 Apr 2003.

The use of Almquist's additions like this one is certainly questionable;
many of them have been removed. However, setvar has been available and
documented in /bin/sh in all versions of FreeBSD, which makes removal
less likely at this point.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read() builtin doesnt read integer value /proc files (but bashs does)

2010-12-18 Thread Jilles Tjoelker
On Wed, Dec 15, 2010 at 12:55:51PM -0600, Jonathan Nieder wrote:
> Cristian Ionescu-Idbohrn wrote:
> > On Sun, 28 Nov 2010, Herbert Xu wrote:
> > > On Sat, Sep 04, 2010 at 07:35:04PM +0000, Jilles Tjoelker wrote:

> >>> This discarding is still bad as it throws away valid data if the open
> >>> file description is shared. This happens if stdin is redirected inside a

> >> I'm with Jilles on this.  I also don't particularly feel like
> >> bloating dash just because of the borked /proc interface when
> >> there is a perfectly adequate work-around in "cat".

> >>value=$(cat /proc/file)

> > I wouldn't call that "a perfectly adequate work-around", but a painful and
> > unadequate work-around.

> For what it's worth, here's what bash does (based on strace):

> 1. Determine whether the file is seekable.  That is, seek using
> SEEK_CUR with offset 0.

> 2. If seekable, read a nice big chunk and then seek back to put the
> file offset back in the right place.  If not seekable, read one byte
> at a time.

> This works in /proc because files in /proc are seekable.

> That said, I don't think borked /proc is a great reason to do this
> (it's a better reason to fix /proc).  Speeding up the read builtin
> might be a good reason.

The optimization is of limited benefit (still way more syscalls than
strictly necessary) and does not apply to the common use case of reading
from a pipe. Generally, if 'read' is too slow, it is better to spend a
fork on a tool like grep, sed or awk which processes large amounts of
text much more efficiently.

As for value=$(cat /proc/file), there are at least two ways to make this
faster. The traditional ksh way is the extension value=$(http://vger.kernel.org/majordomo-info.html


[PATCH] [EXPAND] Do not split the result of tilde expansion

2010-12-05 Thread Jilles Tjoelker
A tilde expansion generates a valid pathname. Splitting it using IFS
either leaves it unchanged or changes it to something unintended.

This fixes FreeBSD sh test expansion/tilde1.0 and does not change the
outcome of the other tests.

This fixes Debian bug #601096.

Example:
  IFS=m HOME=/tmp; printf "%s\n" ~
---
 src/expand.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/src/expand.c b/src/expand.c
index 1b77b7c..60d4798 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -395,7 +395,6 @@ done:
*p = c;
startloc = expdest - (char *)stackblock();
strtodest(home, SQSYNTAX, quotes);
-   recordregion(startloc, expdest - (char *)stackblock(), 0);
return (p);
 lose:
*p = c;
-- 
1.7.3.2

--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [PARSER] Remove backslash before } in double-quotes in variable

2010-11-21 Thread Jilles Tjoelker
The backslash prevents the closing brace from terminating the
substitution, therefore it should be removed.

FreeBSD sh test expansion/plus-minus2.0 starts working, no other tests
are affected.

Example:
  printf "%s\n" ${$+\}} ${$+"\}"} "${$+\}}"
should print } three times, without backslashes.
---
 src/parser.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/src/parser.c b/src/parser.c
index be20ff7..25f22fd 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -916,6 +916,9 @@ readtoken1(int firstc, char const *syntax, char *eofmark, 
int striptabs)
c != '$' && (
c != '"' ||
eofmark != NULL
+   ) && (
+   c != '}' ||
+   varnest == 0
)
) {
USTPUTC('\\', out);
-- 
1.7.3.2

--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [REDIR] Replace GPL noclobberopen code with the FreeBSD version.

2010-11-20 Thread Jilles Tjoelker
Replace noclobberopen() from bash with the FreeBSD code for noclobber
opens.

This also reduces code size by eliminating an unnecessary check.
---
 src/redir.c |   77 +++---
 1 files changed, 15 insertions(+), 62 deletions(-)

diff --git a/src/redir.c b/src/redir.c
index b4e49c0..f96a76b 100644
--- a/src/redir.c
+++ b/src/redir.c
@@ -84,7 +84,6 @@ STATIC void dupredirect(union node *, int, char[10]);
 STATIC void dupredirect(union node *, int);
 #endif
 STATIC int openhere(union node *);
-STATIC int noclobberopen(const char *);
 
 
 /*
@@ -168,6 +167,7 @@ redirect(union node *redir, int flags)
 STATIC int
 openredirect(union node *redir)
 {
+   struct stat64 sb;
char *fname;
int f;
 
@@ -186,8 +186,21 @@ openredirect(union node *redir)
/* Take care of noclobber mode. */
if (Cflag) {
fname = redir->nfile.expfname;
-   if ((f = noclobberopen(fname)) < 0)
+   if (stat64(fname, &sb) < 0) {
+   if ((f = open64(fname, O_WRONLY|O_CREAT|O_EXCL, 
0666)) < 0)
+   goto ecreate;
+   } else if (!S_ISREG(sb.st_mode)) {
+   if ((f = open64(fname, O_WRONLY, 0666)) < 0)
+   goto ecreate;
+   if (fstat64(f, &sb) < 0 && S_ISREG(sb.st_mode)) 
{
+   close(f);
+   errno = EEXIST;
+   goto ecreate;
+   }
+   } else {
+   errno = EEXIST;
goto ecreate;
+   }
break;
}
/* FALLTHROUGH */
@@ -397,66 +410,6 @@ savefd(int from, int ofd)
 }
 
 
-/*
- * Open a file in noclobber mode.
- * The code was copied from bash.
- */
-int
-noclobberopen(fname)
-   const char *fname;
-{
-   int r, fd;
-   struct stat64 finfo, finfo2;
-
-   /*
-* If the file exists and is a regular file, return an error
-* immediately.
-*/
-   r = stat64(fname, &finfo);
-   if (r == 0 && S_ISREG(finfo.st_mode)) {
-   errno = EEXIST;
-   return -1;
-   }
-
-   /*
-* If the file was not present (r != 0), make sure we open it
-* exclusively so that if it is created before we open it, our open
-* will fail.  Make sure that we do not truncate an existing file.
-* Note that we don't turn on O_EXCL unless the stat failed -- if the
-* file was not a regular file, we leave O_EXCL off.
-*/
-   if (r != 0)
-   return open64(fname, O_WRONLY|O_CREAT|O_EXCL, 0666);
-   fd = open64(fname, O_WRONLY|O_CREAT, 0666);
-
-   /* If the open failed, return the file descriptor right away. */
-   if (fd < 0)
-   return fd;
-
-   /*
-* OK, the open succeeded, but the file may have been changed from a
-* non-regular file to a regular file between the stat and the open.
-* We are assuming that the O_EXCL open handles the case where FILENAME
-* did not exist and is symlinked to an existing file between the stat
-* and open.
-*/
-
-   /*
-* If we can open it and fstat the file descriptor, and neither check
-* revealed that it was a regular file, and the file has not been
-* replaced, return the file descriptor.
-*/
-if (fstat64(fd, &finfo2) == 0 && !S_ISREG(finfo2.st_mode) &&
-finfo.st_dev == finfo2.st_dev && finfo.st_ino == finfo2.st_ino)
-   return fd;
-
-   /* The file has been replaced.  badness. */
-   close(fd);
-   errno = EEXIST;
-   return -1;
-}
-
-
 int
 redirectsafe(union node *redir, int flags)
 {
-- 
1.7.3.2

--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash "set -e" breaks "trap INT"

2010-11-13 Thread Jilles Tjoelker
On Wed, Nov 10, 2010 at 03:49:19PM +0200, Dan Muresan wrote:
> I've submitted a bug regarding trap + "set -e" on Ubuntu Launchpad
> [1], but that's probably not the best place to talk about the issue.

> I'm using dash 0.5.5.1-3ubuntu2 from Ubuntu Lucid on an x86 machine.
> With "set -e", only the first command in an INT trap handler gets
> executed -- UNLESS that first command is a call to a function that
> returns 0. Here's an example of the problem:

> --- 8-x ---

> #!/bin/sh
> set -e
> 
> trap 'hnd' INT
> trap 'echo EXIT' EXIT
> 
> zero() {
>   return 0;
> }
> 
> hnd() {
>   #zero
>   echo "Ignored 1"; echo "Ignored 2"
>   sleep 1
>   echo Back
> }
> 
> for i in 3 2 1; do
> echo "$i"
> sleep 1 # || true
> done
> 
> echo OUT

> --- 8-x ---

> If we uncomment either the call to zero, or the || true check, the
> entire handler gets executed. A set +e inside the handler makes no
> difference.

> My workaround at the moment is to trap both INT and EXIT (I'm not
> going to rely on the "zero" bit of magic). I suppose this workaround
> will unfortunately have to stick around for a while -- even if this
> bug gets fixed, scripts can't assume they are running on the latest
> version of dash.

> [1] https://bugs.launchpad.net/ubuntu/+source/dash/+bug/673119

I can reproduce this with v0.5.5.1 but not with v0.5.6 or master, so
apparently it has been fixed.

The exit after the INT trap has been executed also occurs in all other
shells I tried (newer dash, FreeBSD sh, mksh, true Bourne shell), except
ksh93 which does not execute the INT trap at all, only the EXIT trap.
The cause is that sleep also gets the SIGINT and terminates on it, which
can be seen by removing  set -e  and adding  echo rc=$?  after sleep.

ksh93 has a sleep builtin, but this has no effect on above. Behaviour
remains the same with /bin/sleep.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Quoted closing brace in variable default expansion

2010-11-13 Thread Jilles Tjoelker
On Sat, Nov 13, 2010 at 10:41:47AM -0600, Jonathan Nieder wrote:
> Harald van Dijk wrote:

> >   $ ksh -c 'echo "${x:-"}"}"'
> >   }
> >   $ dash -c 'echo "${x:-"}"}"'
> >   dash: 1: Syntax error: Unterminated quoted string
> >   $ busybox sh -c 'echo "${x:-"}"}"'
> >   sh: syntax error: unterminated quoted string

> > It looks like dash and other ash derivatives stop the expansion with
> > the first }, instead of the first unquoted }. I'm getting confused
> > trying to figure out whether this is a bug in dash or in the script
> > relying on it.

> The answer depends on how portable the script is meant to be.  If
> the goal is to be portable to shells implementing future versions of
> the POSIX standard, there seems to be have been an interpretation[1]
> approved for the next major revision:

>  http://austingroupbugs.net/view.php?id=221
>  note #399

> which would make the example nonconformant because there are an odd
> number of unescaped double-quotes before the first unescaped closing
> brace.

Correct.

> See http://thread.gmane.org/gmane.comp.shells.dash/262/focus=263 for
> a nice summary (thanks, Jilles!).

Note that I have changed FreeBSD 9 sh since I wrote that. To preserve
sanity in expand.c, a substitution now ends with the same quoting level
as it started. Closing braces that do not conform to this are taken
literally. Making the parole configure script do what the author
expected was a bonus.

The change was needed to implement word splitting in WORD in ${v+WORD}.
For that, expand.c needs to know the quoted state of any character
of a word in ${v+-word}. Like NetBSD sh and dash have done, I added a
new magic character CTLQUOTEEND that shows the end of the quoted part
and interpreted CTLQUOTEMARK as the start. If I now have something like
  ${v-abc${vv-def"ghi}jkl"}
I would have to pass through the fact that the quotes were opened in
${vv-def"ghi} to the outer substitution, and I did not want that. The
change avoids it.

FreeBSD 9 sh is now very close to ksh93 in treatment of POSIX
substitutions and I am happy with this. (One of the deliberate
differences is that the above incorrect construct is flagged as an error
(missing '}') while ksh93 treats it as an incomplete command.)

As for the portable alternative, split up the command. Either put the
'}'-containing thing in a variable and use that as alternative or use an
explicit conditional. The Autoconf documentation has more information,
for example.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Improved LINENO support

2010-11-12 Thread Jilles Tjoelker
On Fri, Nov 12, 2010 at 08:07:41PM +0059, Harald van Dijk wrote:
> must not write to test-7 and test2-8. It may write to test-1 and 
> test2-1, or it may write to test-8 and test2-8. Perhaps even to test-7 
> and test2-7. Again, though, LINENO must be the same in both expansions. 
> So for dash, the problem doesn't change: storing the line number of each 
> word, rather than each command, makes it very hard to get LINENO right.

Hmm, I hadn't thought of that. I considered a LINENO that points to the
affected word most useful and easiest to implement (also because FreeBSD
sh's LINENO works this way).

So some more ideas:

A per-command LINENO does not require adding the number to all of the
node types. Only node types that are commands that perform expansions
need it: NCMD, NREDIR, NBACKGND, NSUBSHELL, NFOR, NCASE.

NFOR and NCASE may be wrapped in an NREDIR node to hold the line number.
This simplifies the code a bit at the cost of some extra memory usage.

A further simplification might be removing the redirect field from
NBACKGND and NSUBSHELL and putting any redirect in an additional NREDIR
node inside the NBACKGND/NSUBSHELL (so that the file is opened in the
child process and job control works correctly). However, NREDIR,
NBACKGND and NSUBSHELL are compatible types so it does not seem to gain
much.

What about making the lineno an element of the redirection list like
NTO? This minimizes changes to eval.c but perhaps it is too contrived.

Second idea: use a per-word LINENO but somehow ensure it is the same for
all words in a command. I think this is wrong, but perhaps I'm wrong.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Improved LINENO support

2010-11-10 Thread Jilles Tjoelker
On Mon, Nov 08, 2010 at 09:32:56PM +0059, Harald van Dijk wrote:
> This patch improves LINENO support by storing line numbers in the parse 
> tree, for commands as well as for function definitions. It makes LINENO 
> behaves properly when calling functions, and has the added benefit of 
> improved line numbers in error messages when the last-parsed command is 
> not the last-executed one. It removes the earlier LINENO support, and 
> instead sets LINENO from evaltree when a command is executed. I have 
> used it for a while, and have seen correct line numbers in basic testing 
> and in error messages in fairly large and complicated autoconf-generated 
> shell scripts. Does this look good enough to add to dash? If there are 
> any problems with it, please let me know, I will be happy to look at 
> this further.

What I was thinking of was adding a lineno field to narg instead of to
all command types. The lineno variable would then be set by expand.c. I
think that leads to a smaller patch and it should still give a sensible
value for almost all errors. Downside is a few more int-to-ascii
conversions. A few divisions is not that bad relative to other things
that are done but printf in modern libcs is bloated and slow and might
slow down things measurably.

FreeBSD's code subtracts the function's starting line number in the
parser rather than at run time. I wonder which is best. (FreeBSD
neglects to save and restore the value, so nested function definitions
may lead to incorrect line numbers, but that could be fixed.)

I think this patch will increase dash's code size at least a little,
which is always a consideration here. I can also look at getting this
into FreeBSD sh.

Thanks,
-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: static vs. dynamic scoping

2010-11-10 Thread Jilles Tjoelker
On Wed, Nov 10, 2010 at 03:15:59PM +0100, ольга крыжановская wrote:
> Eric Blake wrote:
> > Or should dash switch entirely to static scoping (my gut feel is that
> > static scoping may be more efficient to implement, which fits in line
> > with dash's desire to be as lean as possible)?

> static scoping is much more efficient. Think about accessing a global
> variable in a deep function call stack, e.g. 20 functions deep. With
> dynamic scoping you have to look up the list of local variables for
> each function *first* before looking at the global variables. In my
> example this means 20 look ups. Each further function adds another
> look up.
> For static scoping you just look at the local variables of the
> *current* function and then look up the global ones. This is faster
> and much more efficient.

That's not how dash implements it, though. The local builtin saves the
value and attributes of the variable and they are restored upon return.
There is just one table of variables to search (two in older versions
and other ash versions, the additional one storing variable assignments
for regular builtins like IFS= read x; these were converted to the local
mechanism recently).

The real efficiency benefit of static scoping comes when disallowing
access to locals when the name is not known at compile time: at run
time, the names of the locals are not necessary.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bug with <&- under ulimit -n

2010-10-27 Thread Jilles Tjoelker
On Wed, Oct 27, 2010 at 03:33:41PM -0600, Eric Blake wrote:
> Dash does not behave well when under artificial fd pressure due to
> ulimit -n.  It insists on copying a to-be-closed fd to another fd
> greater than 10, then complains when the dup fails, rather than just
> flat-out closing the fd in the first place.  Compare this with ksh93

> $ ksh -c 'ulimit -n 10; : <&-'; echo $?
> 0
> $ dash -c 'ulimit -n 11; : <&-'; echo $?
> 0
> $ dash -c 'ulimit -n 10; : <&-'; echo $?
> dash: 0: Invalid argument
> 2

> See this thread on the bug-tar list for more details:
> http://thread.gmane.org/gmane.comp.gnu.tar.bugs/4010/focus=4020

Summary: use 'exec' to close the file descriptors before setting ulimit
-n ridiculously low instead of relying on redirections to work in such a
restricted environment.

By the way, there is also a bug in some ash versions that fails when
closing an fd (via <&- or >&-) that is not open, except if the
redirection is for 'exec' or an external program.

However, some of the gory details are below.

fcntl(F_DUPFD) fails with [EINVAL] if arg (the minimum for the new file
descriptor) exceeds the ulimit. The FreeBSD kernel, at least, checks
this before checking if the existing file descriptor is open. Therefore
any redirection attempt that must allow for the possibility of restoring
the old open file requires a ulimit -n of at least 11.

That the ksh93 example works is due to ksh93 being very clever (too
clever for its own good, sometimes, if you ask me). It thinks that it
has no need to save the old open file if this is the last command and
there are no trap handlers.

So the following fail (on a FreeBSD 8 kernel):

$ ksh93 -c 'ulimit -n 10; : <&-; :'; echo $?
ksh93[1]: open file limit exceeded [Invalid argument]
1
$ ksh93 -c 'trap "echo hi" EXIT; ulimit -n 10; : <&-'; echo $?
ksh93[1]: open file limit exceeded [Invalid argument]
hi
1

Where the cleverness fails is when that last command sets a trap handler
itself:

$ ksh93 -c ':; trap "echo hi" EXIT >/dev/null'
$

The other shells I tried correctly print "hi".

A further difference between shells is that some shells perform
redirections for a simple command before looking at what kind of command
it is, therefore always saving the old open files, while others postpone
this until after looking what kind of command it is, not saving the old
open files for external programs because the open happens in the child
process (potentially also for 'exec').

The former behaviour is shown by most kinds of Korn shell and dash. The
latter behaviour is shown by the Bourne shell, most other ash
derivatives, bash and zsh.

A command like
  (ulimit -n 10; /bin/foo 3<&- 4<&-; :)
will only work in a shell that uses the latter behaviour.

I consider both behaviours valid and shell script that depends on either
of them broken. They both have historical basis and their own advantages
and disadvantages.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: exit-ing in a trap handler uses the wrong exit code

2010-09-25 Thread Jilles Tjoelker
On Sat, Sep 25, 2010 at 01:59:00PM +0200, Loïc Minier wrote:
>  qemu's configure uses a trap handler which calls exit; configure would
>  call exit 1, but with the trap handler in place the configure script
>  would return with exit 0.

>  dash -c 'trap "sh -c \"exit 4\"; exit" 0 1 2 3 9 11 13 15;exit 3'; echo $?
>  4

>  Blue Swirl on the qemu-devel list points out that according to POSIX
>  this should be 3:
>  
> http://www.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_21

>  bash and zsh get this right:
>  bash -c 'trap "sh -c \"exit 4\"; exit" 0 1 2 3 9 11 13 15;exit 3'; echo $?
>  3

>  I've logged https://bugs.launchpad.net/ubuntu/+source/dash/+bug/647450
>  to track this as I'm seeing this with the Ubuntu version of dash.

You are certainly right that POSIX requires this to be handled
differently, but there are various other shells that are non-compliant
in the same way as dash and it is easy to avoid the behaviour, so I
would not recommend depending on it.

Other shells where 'exit' is equivalent to 'exit $?' even in traps:
NetBSD /bin/sh, FreeBSD /bin/sh (both of these are related to dash),
pdksh, mksh, the original Bourne shell (from the Heirloom toolchest).
Of course this is no guarantee that these shells will not be changed to
be compliant later on, but many deployed systems use a /bin/sh that is
not compliant in this regard.

To avoid the behaviour, only use the exit special builtin in a trap
action for EXIT if you want to override the exit status. If you do not
use exit, all shells preserve the exit status.

For signals, the trap action should typically terminate the process,
either via exit with an explicit non-zero status, or (more properly) by
resetting the trap to its default action and resending the signal. Using
exit without parameters in a signal trap action appears to return status
0 in bash and zsh, which seems less than useful. Only in ksh93 does this
do something useful: it resets the signal to its default action and
resends it; however, if the signal is one that is ignored by default
ksh93 hangs forever.

POSIX says that the same applies to the return special builtin (even
though POSIX only specifies returning from functions or dot scripts, not
from shell instances). This does not appear to make much sense to me,
and bash does not implement it (it does not allow returning from shell
instances either, while dash and ksh93 do).

Further remarks:
* Trying to trap SIGKILL produces undefined results, don't do it.
* Trapping SIGSEGV may cause unexpected results unless the signal was
sent via kill(), sigqueue() or similar. Ash variants return from the
signal handler after setting a flag, which usually causes an infinite
loop as the problematic instruction is retried over and over again. They
do not use a signal stack (sigaltstack()), so a stack overflow will
terminate the process immediately. In any case, if the shell itself gets
an actual SIGSEGV, its state is probably too messed up to continue
executing a script.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix UTF-8 issues in read() builtin

2010-09-07 Thread Jilles Tjoelker
On Wed, Sep 08, 2010 at 01:26:15AM +0400, Alexey Zinovyev wrote:
> Hello, I think there is a bug in read() builtin.

> $ cat test
> echo 'ρ'|while read i; do echo $i; done
> $ dash test

> $ bash test
> ρ

> Same with some japanese symbols.
> Looks like dash strips 0x81 byte. 

0x81 == CTLESC, the escape character in dash's internal representation.

> diff --git a/src/miscbltin.c b/src/miscbltin.c
> index 5ab1648..f8c5655 100644
> --- a/src/miscbltin.c
> +++ b/src/miscbltin.c
> @@ -101,7 +101,6 @@ readcmd_handle_line(char *line, char **ap, size_t len)
>* will not modify the length of the string */
>   offset = sl->text - s;
>   remainder = backup + offset;
> - rmescapes(remainder);
>   setvar(*ap, remainder, 0);
>  
>   return;

This patch is not correct as it will leave 0x81 bytes for backslash
escapes. That is probably a bit worse than ignoring the backslashes
entirely, which is what it does now. It attempts to "escape" the next
character by placing a CTLESC, but CTLESC does not and should not escape
IFS characters for ifsbreakup(); the recordregion() mechanism should be
used for that.

(For the intermediate representation generated by parser.c, CTLESC does
escape IFS characters. This is not ideal as it prevents IFS splitting
with CTL* bytes in word in ${var+-word}.)

The patch I posted separately fixes the handling of 0x81 and various
other issues with read (by using separate code instead of trying to use
expand.c). Backslash escaping works too although I have just found some
bugs with corner cases.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [BUILTIN] Fix various issues with read by importing the NetBSD/FreeBSD code.

2010-09-07 Thread Jilles Tjoelker
This reverts the change to make read use the code from expand.c.
Although read's splitting is similar to the splitting done as part of
word expansions, the differences appear to add more complexity than the
common code saves.

The new code is from FreeBSD (which was originally taken from NetBSD).
The -e and -t options have been left out. Some special handling for
EINTR which is not necessary in FreeBSD sh has been retained from the
older dash code.

We now pass FreeBSD's builtins/read1.0 test, posh's mksh.t:read-IFS-1
test and gsf's ifs.sh.
---
 src/miscbltin.c |  170 ++-
 1 files changed, 92 insertions(+), 78 deletions(-)

diff --git a/src/miscbltin.c b/src/miscbltin.c
index 5ab1648..fc6e8d4 100644
--- a/src/miscbltin.c
+++ b/src/miscbltin.c
@@ -62,69 +62,22 @@
 #undef rflag
 
 
-/** handle one line of the read command.
- *  more fields than variables -> remainder shall be part of last variable.
- *  less fields than variables -> remaining variables unset.
- *
- *  @param line complete line of input
- *  @param ap argument (variable) list
- *  @param len length of line including trailing '\0'
- */
-static void
-readcmd_handle_line(char *line, char **ap, size_t len)
-{
-   struct arglist arglist;
-   struct strlist *sl;
-   char *s, *backup;
-
-   /* ifsbreakup will fiddle with stack region... */
-   s = grabstackstr(line + len);
-
-   /* need a copy, so that delimiters aren't lost
-* in case there are more fields than variables */
-   backup = sstrdup(line);
-
-   arglist.lastp = &arglist.list;
-   recordregion(0, len - 1, 0);
-   
-   ifsbreakup(s, &arglist);
-   *arglist.lastp = NULL;
-   removerecordregions(0);
-
-   for (sl = arglist.list; sl; sl = sl->next) {
-   /* remaining fields present, but no variables left. */
-   if (!ap[1]) {
-   size_t offset;
-   char *remainder;
-
-   /* FIXME little bit hacky, assuming that ifsbreakup 
-* will not modify the length of the string */
-   offset = sl->text - s;
-   remainder = backup + offset;
-   rmescapes(remainder);
-   setvar(*ap, remainder, 0);
-
-   return;
-   }
-   
-   /* set variable to field */
-   rmescapes(sl->text);
-   setvar(*ap, sl->text, 0);
-   ap++;
-   }
-
-   /* nullify remaining arguments */
-   do {
-   setvar(*ap, nullstr, 0);
-   } while (*++ap);
-}
-
 /*
- * The read builtin.  The -e option causes backslashes to escape the
- * following character. The -p option followed by an argument prompts
+ * The read builtin.  The -r option causes backslashes to be treated like
+ * ordinary characters. The -p option followed by an argument prompts
  * with the argument.
  *
  * This uses unbuffered input, which may be avoidable in some cases.
+ *
+ * Note that if IFS=' :' then read x y should work so that:
+ * 'a b'   x='a', y='b'
+ * ' a b ' x='a', y='b'
+ * ':b'x='',  y='b'
+ * ':' x='',  y=''
+ * '::'x='',  y=''
+ * ': :'   x='',  y=''
+ * ':::'   x='',  y='::'
+ * ':b c:' x='',  y='b c:'
  */
 
 int
@@ -135,9 +88,14 @@ readcmd(int argc, char **argv)
char c;
int rflag;
char *prompt;
+   const char *ifs;
char *p;
+   int startword;
int status;
int i;
+   int is_ifs;
+   int saveall = 0;
+   ssize_t n;
 
rflag = 0;
prompt = NULL;
@@ -155,28 +113,28 @@ readcmd(int argc, char **argv)
}
if (*(ap = argptr) == NULL)
sh_error("arg count");
+   if ((ifs = bltinlookup("IFS")) == NULL)
+   ifs = " \t\n";
+
status = 0;
+   startword = 2;
backslash = 0;
STARTSTACKSTR(p);
for (;;) {
-   switch (read(0, &c, 1)) {
-   case 1:
-   break;
-   default:
-   if (errno == EINTR && !pendingsigs)
-   continue;
-   /* fall through */
-   case 0:
+   do
+   n = read(STDIN_FILENO, &c, 1);
+   while (n == -1 && errno == EINTR && !pendingsigs);
+   if (n != 1) {
status = 1;
-   goto out;
+   break;
}
if (c == '\0')
continue;
if (backslash) {
-   if (c == '\n')
-   goto resetbs;
-   STPUTC(CTLESC, p);
-   goto put;
+   backslash = 0;
+   if (c != '\n')
+ 

Re: read() builtin doesn't read integer value /proc files (but bash's does)

2010-09-04 Thread Jilles Tjoelker
On Sat, Sep 04, 2010 at 08:20:33PM +0200, Steve Schnepp wrote:
> 2010/9/3 Jilles Tjoelker :
> > This patch assumes that the file descriptor is discarded afterwards (its
> > position does not matter). Therefore the very common construct
> >  while read x; do
> >    ...
> >  done
> > stops working.

> Ohh.. thanks for that, I didn't see it.

> Actually "while read x" continues to work.
> But "reopening the file" doesn't as in :

> read a b < datafile
> echo ${a} ${b}
> read a b < datafile
> echo ${a} ${b}

You're right, it's even stranger than I expected.

> I attached an updated patch that corrects this pb by discarding the
> buffer when opening a new file.

This discarding is still bad as it throws away valid data if the open
file description is shared. This happens if stdin is redirected inside a
while read... loop.

Furthermore, I think constructions like
  read x; cat
and
  read x; (read y); read z
should keep working. This requires that the input's file position be
synced whenever another process may see it (fork/exit). Due to the
highly dynamic character of the shell and the common use of fd 0, this
probably means that you can't do better than syncing after each read
builtin. (For example, 'read' could be overridden with a function after
the third line.)

Another thought:
  exec 3<&0; read x; read y <&3
or even
  sh -c 'read x; read y <&3' 3<&0
Different file descriptors may refer to the same open file description
and the shell may not know this.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read() builtin doesn't read integer value /proc files (but bash's does)

2010-09-03 Thread Jilles Tjoelker
On Thu, Sep 02, 2010 at 05:02:55PM +0200, Steve Schnepp wrote:
> 2010/9/1 Steve Schnepp :
> > conforming to POSIX isn't a realistic option, would it be possible to
> > have a workaround that doesn't involve an external tool like cat(1) ?

> Hi, I just hacked & attached a little patch away to be able to solve
> this case.
> Feel free to reply with your comments.

> NB: I just targeted dash-0.5.5.1, but it might apply to any version.

This patch assumes that the file descriptor is discarded afterwards (its
position does not matter). Therefore the very common construct
  while read x; do
...
  done
stops working.

A possible fix is to check first if the input supports seeking. If it
does, use the buffering and at the end of the line seek backwards for
the number of bytes remaining in the buffer. If it does not, read one
byte at a time.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read() builtin doesn't read integer value /proc files (but bash's does)

2010-09-02 Thread Jilles Tjoelker
On Wed, Sep 01, 2010 at 10:10:11AM +0200, Steve Schnepp wrote:
> Hi, I opened bug 595063 on the debian BTS [1] and I was suggested to
> resend the email upstream.

> So I copied the body of the bug below :

> dash's read() builtin seems to read the underlying file 1 char at a
> time. This doesn't work with some files under /proc, since procfs isn't
> fully POSIX compliant.

> [snip]

> After a little digging, it only appears on files that contains just an
> integer value. When asked to read with a non-null offset (*ppos != 0),
> __do_proc_dointvec() just returns 0 (meaning an EOF) as shown on [2].

> I'm aware that the issue isn't strictly a dash one, since it has the
> right to read one character at a time. But since fixing procfs to be
> conforming to POSIX isn't a realistic option, would it be possible to
> have a workaround that doesn't involve an external tool like cat(1) ?

Given that other files in /proc do work, I don't see why the ones that
only contain an integer value cannot be fixed. All the necessary state
to produce the second and further bytes is available.

Choosing a powerful abstraction like a regular file has its
implications.

Note that a change in the file between the single-byte reads will cause
an inconsistent value to be read. This is also the case with regular
files on a filesystem, so it is acceptable.

If single-byte reads are really unacceptable, then the proper way to
read these files needs to be documented, and clear violations that will
not work properly should cause an error (in this case, this means that
reading one byte from offset 0 should fail like reading one byte from
offset 1 does).

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Partial patch] IFS and read builtin

2010-08-24 Thread Jilles Tjoelker
On Tue, Aug 24, 2010 at 12:51:47AM +0200, Harald van Dijk wrote:
> On 23/08/10 21:35, Jilles Tjoelker wrote:
> > I think you should do what you think is best for the stability of your
> > product. Because dash releases are not extensively tested, I'd recommend
> > a trial build of at least a minimal base system with the new version you
> > choose. A particular feature to be wary of is LINENO support, as it will
> > cause most configure scripts to accept dash as a usable shell.

> Thanks, I'm aware of that. I already locally exported CONFIG_SHELL, so
> that even without LINENO support the configure scripts were already run
> from dash.

> That reminds me: the LINENO support is useful, but the tracking of line
> numbers has some issues:

> $ src/dash -c 'f() { echo $LINENO; }
> f
> f
> '
> 2
> 3

> But this is not new, and not limited to LINENO:

> $ cat >foo.sh
> if :; then
> foo
> :
> :
> :
> :
> :
> fi
> $ src/dash foo.sh
> foo.sh: 8: foo: not found
> $ bash foo.sh
> foo.sh: line 2: foo: command not found

> I have a patch that improves this by storing the line numbers in the
> command nodes, if you're interested, but it needs polishing before I
> plan on sending it here or anywhere, and there are probably some corner
> cases that it mishandles.

Yes, I think that's the proper way to implement LINENO.

FreeBSD sh avoids extending the nodes by detecting expansions of LINENO
at parse time and storing the line number at that time. However, this is
only possible because it does not print a line number when there is an
error in a builtin.

> [IFS=", "]
> > I think the important thing is that IFS characters are supposed to be
> > field terminators (see POSIX XCU 2.6.5 Field Splitting).

> > Therefore, in the example " 1 ,2 3," there are three fields, each
> > containing one digit, and each variable is assigned one of them.

> The more I read it, the more I'm actually becoming convinced that zsh is
> doing the right thing, and dash is almost doing the right thing.

> 2.6.5 uses the term "delimiter", not "terminator". They don't mean the
> same thing. A delimiter can mark the start of a field as well as the
> end. And if you compare susv2 with susv3, you may see susv2 is a lot
> clearer than v3 on one point, because it ends the "Field Splitting"
> section with a note.

> "The last rule can be summarised as a pseudo-ERE:

> (s*ns*|s+)

>  where s is an white-space character and n is a character in the
>  that is not white space. Any string matching that ERE delimits a
>  field, except that the s+ form does not delimit fields at the
>  beginning or the end of a line." (followed by an example)

> This says the s+ form does not delimit fields at the end of a line,
> which strongly implies that the s*ns* form does. The wording is wrong no
> matter how you look at it (splitting "a  " results in one field "a", not
> one field "a  "), and the note has been removed in susv3. Still, it
> manages to somewhat clarify the rest of text.

POSIX.1-2008 (aka SUSv4) says, at one point, that the shell shall "use
the delimiters as field terminators".

The specification of field splitting did indeed change at some point, so
that a final non-whitespace IFS character does not have a final empty
field after it. Likely, the old spec was not what was intended as the
System V sh behaved much like the new spec (though not exactly).
Generally, the POSIX shell command language is designed to match the
Bourne shell (as in System V) and ksh88, and deviations from this are
mentioned in the rationale. Note that this does not mean that either the
Bourne shell or ksh88 are compliant.

> > In the example " 1 ,2 3,," there are four fields, the last being empty.
> > Then c is assigned the third field plus the delimiter character and the
> > remaining fields and their delimiters except trailing whitespace that is
> > in IFS. Hence, both commas end up in c.

> The read command description states:

> "If there are fewer var operands specified than there are fields, the
>  leftover fields and their intervening separators shall be assigned to
>  the last var."

> If " 1 ,2 3,," forms four fields, where the fourth field is the empty
> string between the two trailing commas, then the final comma is not an
> "intervening separator", so it should be excluded from c.

The description of the read utility is also different in POSIX.1-2008.
It does not talk about "intervening separators" but about "delimiters"
in general.

The intention is that if there are more fields than variables, the final
variable

Re: trap bug in recent versions of dash

2010-08-23 Thread Jilles Tjoelker
On Mon, Aug 23, 2010 at 12:40:48PM +0200, Guido Berhoerster wrote:
> * Jilles Tjoelker  [2010-08-23 00:32]:
> > If you want to try something, here is a patch. I have verified that the
> > only change to the results of FreeBSD sh's testsuite is that the test
> > builtins/break2.0 starts working (there are still 51 other broken
> > tests). There is no change in output from the posh testsuite (run with
> >   -C sh,posix,no-typeset,no-arrays,no-coprocs,no-herestrings,no-history
> > ).

> > diff --git a/src/eval.c b/src/eval.c
> > index d5e5c95..e484bec 100644
> > --- a/src/eval.c
> > +++ b/src/eval.c
> > @@ -307,9 +307,9 @@ setstatus:
> > break;
> > }
> >  out:
> > -   if ((checkexit & exitstatus) ||
> > -   (pendingsigs && dotrap()) ||
> > -   (flags & EV_EXIT))
> > +   if (pendingsigs)
> > +   dotrap();
> > +   if ((flags & EV_EXIT) || (checkexit & exitstatus))
> > exraise(EXEXIT);
> >  }

> Unfortunately this seems to corrupt variables.
> See the attached test script, after the TERM signal $value
> is not empty anymore but contains garbage.

I think this is the same strange bug as Harald van Dijk reported. The
below gives incorrect output, and quoting the command substitution works
around the issue:

dash -c 'printf "a\t\tb\n" | { IFS=$(printf "\t") read a b c; echo 
"|$a|$b|$c|"; }'

In your script, if I write  read "$2"  instead of  read $2  it works.

> Where can I find FreeBSD's sh tests?

You need the tools/regression/bin/sh directory from the head (cvs: HEAD)
branch of the FreeBSD src repository. See
http://www.freebsd.org/developers/cvs.html for access methods. There are
also unofficial git, hg and other mirrors if you prefer that.

The tests are designed to run under perl's prove(1) tool.

The tests test the first instance of "sh" in the PATH. To test other
shells, create a symlink named sh in a new directory and add that
directory in front of your PATH. This is a little clumsy but also
ensures bash and zsh enable their compatibility modes.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Partial patch] IFS and read builtin

2010-08-23 Thread Jilles Tjoelker
On Mon, Aug 23, 2010 at 02:03:01AM +0200, Harald van Dijk wrote:
> On 23/08/10 01:00, Jilles Tjoelker wrote:
> > On Mon, Aug 23, 2010 at 12:20:12AM +0200, Harald van Dijk wrote:
> >>[...]
> >>   echo a:b | { IFS=: read a b; echo $a; }
> >>[...]
> > 
> > This has already been fixed in a totally different way in master. See
> > git commits near 95a60b2936e8835963bfb08eadc0edf9dddf0498.

> Interesting, thank you for the reference. That commit (from May) does
> a lot more than fix the read problem, and does not specifically
> address the read problem at all, which is probably why it has not made
> it into 0.5.6.1 (from June), and why I had not found it.

> The IFS problem was breaking the build of glibc, and reportedly also
> causing an unbootable system for someone because of a bad build of
> libusb, so it was important to fix it somehow for Gentoo. Would you
> prefer I continue to use a version of the patch I posted, or use the
> more invasive commits from master?

I think you should do what you think is best for the stability of your
product. Because dash releases are not extensively tested, I'd recommend
a trial build of at least a minimal base system with the new version you
choose. A particular feature to be wary of is LINENO support, as it will
cause most configure scripts to accept dash as a usable shell.

> >>$ src/dash -c 'printf "a\t\tb\n" | { IFS=$(printf "\t") read a b c; 
> >> echo "|$a|$b|$c|"; }'
> >>|a||b|
> >>$ src/dash -c 'printf "a\t\tb\n" | { IFS="$(printf "\t")" read a b c; 
> >> echo "|$a|$b|$c|"; }'
> >>|a|b||
> >>[...]
> > Ick. Your change will probably work, but it remains sneaky action at a
> > distance. To reduce complexity, it would be good to implement read's
> > splitting without using expand.c.

> read used to have its own splitting code, but it was changed to use
> expand.c some time between 0.5.5 and 0.5.6.1. The old splitting code
> had bugs that expand.c did not have.

Right, but it seems better to fix the bugs rather than entangle things
with expand.c and still leave bugs.

> > I estimate the extra lines of code
> > from importing FreeBSD's code at less than 50. It will also fix an edge
> > case with the splitting. The last two lines of FreeBSD's
> > builtins/read1.0 test are:

> > echo " 1 ,2 3," | { IFS=', ' read a b c; echo "x${a}x${b}x${c}x"; }
> > echo " 1 ,2 3,,"| { IFS=', ' read a b c; echo "x${a}x${b}x${c}x"; }

> > These should result in:

> > x1x2x3x
> > x1x2x3,,x

> > bash and ksh93 agree on this. However, dash master prints:

> > x1x2x3,x
> > x1x2x3,,x

> This is also what zsh prints. Could you explain why bash and ksh are
> correct? By my reading, when IFS=', ', then splitting " 1 ,2 3,"
> results in "1", "2", "3", and "". Leading and trailing IFS white space
> gets ignored, but the trailing , is not white space, so makes the
> total field count 4. Since there are four fields, a is assigned "1", b
> is assigned "2", c is assigned "3," which is "3", the separator ",",
> and the final field "". But I'll be the first to admit I'm far from an
> expert, and I trust bash to get these things right, I'm just curious
> why bash is right.

> That said, zsh is consistent here, and dash is not. zsh also makes
>   x=" 1 ,2 3,"
>   IFS=", "
>   set $x
>   echo $#
> print out 4, while dash prints 3 in this case. So either way,
> something is wrong.

I think the important thing is that IFS characters are supposed to be
field terminators (see POSIX XCU 2.6.5 Field Splitting).

Therefore, in the example " 1 ,2 3," there are three fields, each
containing one digit, and each variable is assigned one of them.

In the example " 1 ,2 3,," there are four fields, the last being empty.
Then c is assigned the third field plus the delimiter character and the
remaining fields and their delimiters except trailing whitespace that is
in IFS. Hence, both commas end up in c.

Likewise, in your example with normal field splitting, dash is right and
zsh is wrong.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Partial patch] IFS and read builtin

2010-08-22 Thread Jilles Tjoelker
On Mon, Aug 23, 2010 at 12:20:12AM +0200, Harald van Dijk wrote:
> Hi, as has been reported already dash currently has a bug where the
> read builtin ignores the read environment's IFS setting. As a result,

>   echo a:b | { IFS=: read a b; echo $a; }

> will write out a:b. I tried to see what changed between 0.5.5.1 and
> 0.5.6, and found that the old code used bltinlookup("IFS"). So, I made
> the new code also use that. The above example works properly with the
> attached patch.

This has already been fixed in a totally different way in master. See
git commits near 95a60b2936e8835963bfb08eadc0edf9dddf0498.

> However, testing further, I found that there is bad interaction (even
> without my patch, where the IFS assignment below should just be
> ignored) between the code that handles variable assignments, and read.
> Try this:

>$ src/dash -c 'printf "a\t\tb\n" | { IFS=$(printf "\t") read a b c; echo 
> "|$a|$b|$c|"; }'
>|a||b|
>$ src/dash -c 'printf "a\t\tb\n" | { IFS="$(printf "\t")" read a b c; echo 
> "|$a|$b|$c|"; }'
>|a|b||

> This happens because expbackq is correctly called without EXP_QUOTED,
> which makes it call recordregion, which isn't cleared by the time read
> calls ifsbreakup. I'm not sure how that should be solved, and if there
> are more cases where it would fail. The simplest way to solve this for
> read is to call removerecordregions(0) before recordregion(0, len - 1,
> 0). I have tested that locally, and it works. I am not familiar enough
> with the code to judge whether the same situation can also happen in
> other cases that would also need fixing, which is why I have not
> included that part in the attached patch.

Ick. Your change will probably work, but it remains sneaky action at a
distance. To reduce complexity, it would be good to implement read's
splitting without using expand.c. I estimate the extra lines of code
from importing FreeBSD's code at less than 50. It will also fix an edge
case with the splitting. The last two lines of FreeBSD's
builtins/read1.0 test are:

echo " 1 ,2 3," | { IFS=', ' read a b c; echo "x${a}x${b}x${c}x"; }
echo " 1 ,2 3,,"| { IFS=', ' read a b c; echo "x${a}x${b}x${c}x"; }

These should result in:

x1x2x3x
x1x2x3,,x

bash and ksh93 agree on this. However, dash master prints:

x1x2x3,x
x1x2x3,,x

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: trap bug in recent versions of dash

2010-08-22 Thread Jilles Tjoelker
On Mon, Aug 16, 2010 at 01:52:56PM +0200, Guido Berhoerster wrote:
> * Jilles Tjoelker  [2010-08-15 22:05]:
> > On Wed, Aug 11, 2010 at 10:06:16AM +0200, Guido Berhoerster wrote:
> > > with the latest git version of dash trap actions are not
> > > evaluated in the context of a function.

> > I think dotrap()'s return value should be removed or at least ignored.
> > An "evalskip" (break/continue/return-function/return-file) should never
> > lead to an immediate exit. I'm not supplying a patch because I am not
> > entirely sure about the side effects this could have.

> OK, I'm not familiar with the source but it'd be nice to have it
> fixed before the next release since it is a regression breaking
> scripts.

If you want to try something, here is a patch. I have verified that the
only change to the results of FreeBSD sh's testsuite is that the test
builtins/break2.0 starts working (there are still 51 other broken
tests). There is no change in output from the posh testsuite (run with
  -C sh,posix,no-typeset,no-arrays,no-coprocs,no-herestrings,no-history
).

diff --git a/src/eval.c b/src/eval.c
index d5e5c95..e484bec 100644
--- a/src/eval.c
+++ b/src/eval.c
@@ -307,9 +307,9 @@ setstatus:
break;
}
 out:
-   if ((checkexit & exitstatus) ||
-   (pendingsigs && dotrap()) ||
-   (flags & EV_EXIT))
+   if (pendingsigs)
+   dotrap();
+   if ((flags & EV_EXIT) || (checkexit & exitstatus))
exraise(EXEXIT);
 }

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: trap bug in recent versions of dash

2010-08-15 Thread Jilles Tjoelker
On Wed, Aug 11, 2010 at 10:06:16AM +0200, Guido Berhoerster wrote:
> with the latest git version of dash trap actions are not
> evaluated in the context of a function.

I think dotrap()'s return value should be removed or at least ignored.
An "evalskip" (break/continue/return-function/return-file) should never
lead to an immediate exit. I'm not supplying a patch because I am not
entirely sure about the side effects this could have.

Related, there may be a bug with SKIPFILE. This constant is never tested
against, and this makes 'break' inside a dot script behave strangely.
Example (this must be saved to a file, as it sources itself):

if [ "$1" != nested ]; then
while :; do
set -- nested
. "$0"
echo bad2
exit 2
done
exit 0
fi
break
echo bad1
exit 1

> The following script demonstrates the bug:
> 8<
> read_timeout () {
> saved_traps="$(trap)"

This does not work in dash, it always returns an empty string because
trap is evaluated in a subshell.

Some other ash variants do not evaluate most command substitutions
containing only a single builtin in a subshell, but this was removed in
NetBSD sh and dash because such commands can affect the shell in wrong
ways (e.g. $(exit 1) causes FreeBSD sh to exit).

A recent or proposed POSIX interpretation has said that $(trap) should
work, and that this may be done by treating a lone trap command
specially or by having trap in a subshell output the parent's traps
until a trap has been set in the subshell. To help the former case,
stuff like TRAP=trap; y=$($TRAP) is not required to work.

A problem in the script is that it does not handle TERM set to the
default action properly, as this is not included in trap's output.

> trap 'printf "timed out\n"; eval "${saved_traps}"; return' TERM
> ( sleep $1; kill -TERM $$ ) >/dev/null 2>&1 &

For portability, I recommend braces
{ sleep $1; kill -TERM $$; } >/dev/null 2>&1 &

Some shells do not treat a background subshell specially and fork twice,
which would cause the wrong process to be killed below.

> timer_pid=$!
> read $2
> kill $timer_pid 2>/dev/null

eval "${saved_traps}"  is missing here.

> }

> read_timeout 5 value
> printf "read \"%s\"\n" "${value:=default}"

> >8
> The return statement in the trap inside the read_timeout function
> does not return from the function but rather exits the script.

> With dash 0.5.5.1 it works as expected.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [VAR] Fix history (libedit only).

2010-08-13 Thread Jilles Tjoelker
Git commit 0df96793ef6aa103df228d7dfe56099b7d721a15 "[SHELL] Add
preliminary LINENO support" added the LINENO variable in the middle of
other initialized variables, causing some macros for TERM and HISTSIZE
to break (both of these are only used if libedit support is compiled in,
which is not the case by default).

The breakage is the same as can be seen by setting HISTSIZE=0.

Also add a comment warning about this.

The breakage was reported by Wez Furlong.
---
 src/var.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/src/var.c b/src/var.c
index 3efc943..25c2216 100644
--- a/src/var.c
+++ b/src/var.c
@@ -78,6 +78,7 @@ const char defifsvar[] = "IFS= \t\n";
 const char defifs[] = " \t\n";
 #endif
 
+/* Some macros in var.h depend on the order, add new variables to the end. */
 struct var varinit[] = {
 #if ATTY
{ 0,VSTRFIXED|VTEXTFIXED|VUNSET,"ATTY\0",   0 },
@@ -94,11 +95,11 @@ struct var varinit[] = {
{ 0,VSTRFIXED|VTEXTFIXED,   "PS2=> ",   0 },
{ 0,VSTRFIXED|VTEXTFIXED,   "PS4=+ ",   0 },
{ 0,VSTRFIXED|VTEXTFIXED,   "OPTIND=1", getoptsreset },
-   { 0,VSTRFIXED|VTEXTFIXED,   "LINENO=1", 0 },
 #ifndef SMALL
{ 0,VSTRFIXED|VTEXTFIXED|VUNSET,"TERM\0",   0 },
{ 0,VSTRFIXED|VTEXTFIXED|VUNSET,"HISTSIZE\0",   sethistsize },
 #endif
+   { 0,VSTRFIXED|VTEXTFIXED,   "LINENO=1", 0 },
 };
 
 STATIC struct var *vartab[VTABSIZE];
-- 
1.7.1.1
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: "fc -l" broken between v0.5.5.1 and v0.5.6

2010-08-08 Thread Jilles Tjoelker
On Mon, Jul 26, 2010 at 10:17:43AM -0700, Wez Furlong wrote:
> When building with libedit, I get the following behavior:

> % ./src/dash -E
> $ fc -l
> fc: 1: history number -16 not found (internal error)
> $

> dialling back to v0.5.5.1:

> % ./src/dash -E
> $ fc -l
> 1 fc -l
> $

> It's totally invisible to me the cause of this regression, otherwise I
> would have also included a patch!

> This occurs on both darwin and centos 5.3.

> Let me know if I can provide more information,

Please bisect this to find the commit that introduced the problem. See
git help bisect  or
http://www.kernel.org/pub/software/scm/git/docs/git-bisect-lk2009.html .

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: The Greek letter "rho" is considered as two letters

2010-08-08 Thread Jilles Tjoelker
On Sat, Aug 07, 2010 at 10:57:12PM +0300, Alkis Georgopoulos wrote:
> Erm actually this problem happens with all utf8 characters, i.e. dash
> does not properly take utf8 characters into account when expanding "?".

> $ touch appétit  
> $ ls app?tit
> ls: cannot access app?tit: No such file or directory
> $ ls app??tit
> appétit

Yes, it seems that dash has zero support for locales. In some ways this
is an advantage, as locale support can make things considerably slower
and configure/startup scripts don't need it. However, it leads to
inconsistent behaviour with other utilities that do support locales.

For FreeBSD's /bin/sh, which is another ash variant, I think some degree
of locale support (at least for utf-8) is desirable at some point. This
would include changing pattern matching and ${#var}.

I don't know what Herbert Xu thinks about this.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: No wildcard expansion with redirected filenames

2010-08-08 Thread Jilles Tjoelker
On Sat, Aug 07, 2010 at 10:34:21PM +0300, Alkis Georgopoulos wrote:
> Isn't this supposed to work?

> $ touch file1
> $ echo 1 > file?
> $ ls file? 
> file?  file1

> I.e. dash actually created a file named 'file?' instead of globbing.

No, this is not supposed to work. POSIX says that a non-interactive
shell must not generate pathnames for a redirection; an interactive
shell may do so, provided there is exactly one match.

Dash is primarily meant as a non-interactive shell and wants to be as
small as possible, and therefore never generates pathnames for a
redirection.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Some utf-8 characters are cut in the middle on redirections

2010-08-08 Thread Jilles Tjoelker
On Sat, Aug 07, 2010 at 11:05:42PM +0300, Alkis Georgopoulos wrote:
> $ touch ??
> $ echo 1 > ??
> $ ls
> ?  ??
> $ ls | hexdump -C
>   cf 0a cf 81 0a

> dash "cut" the "0x81" from the "0xcf81" representation of the Greek
> character rho, and so it created another file named "0xcf".

> This only happens for a few utf-8 characters, not for all of them.
> Relevant bug report:
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=532302

This bug is fixed in git commit f8231aea37e921492fc7fbd972385ab5b90e8627
which is not in any dash release yet.

The patch should apply to older dash versions also.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


debian patches to exit with code 127 for nonexistent/directory scripts

2010-06-05 Thread Jilles Tjoelker
Debian's dash package has some local changes which cause an exit with
code 127, as required by POSIX, if a script (passed with dash
) cannot be opened or cannot be read because it is a
directory.

Unfortunately, these patches also affect the . builtin (if the pathname
contains a slash) and use EXEXIT, which means such errors always cause
the shell to exit, even in interactive mode or if the builtin's
specialness has been disabled using command.

% dash
$ . ./nonexistent
.: 1: Can't open ./nonexistent
zsh: exit 127   dash
%

% dash -c 'command . ./nonexistent; echo continued'

Note: Do not compare this with bash. Bash deliberately does not follow
POSIX XCU 2.8.1 Consequences of Shell Errors if not in POSIX mode, and
even in POSIX mode trying to source a nonexistent dot script (without
slash in the pathname) fails to abort the shell.

Note 2: POSIX seems unclear about what 'command .' should do, but is
very clear that failure to find/read a dot script shall not cause an
interactive shell to exit.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] [VAR] Replace cmdenviron with localvars

2010-05-29 Thread Jilles Tjoelker
On Thu, May 27, 2010 at 01:51:41PM +1000, Herbert Xu wrote:
> In order to ensure that local still works, it is now a special
> built-in.

Although there are reasons to do this (a 'local' function containing
'command local' would not work, and export, readonly and ksh93's typeset
are special as well), consensus seems to be for local to be a regular
builtin. posh's local was special for a short time, but this was then
reverted.

Nevertheless, it is very nice to have IFS=: read ... working again.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Job control bug in revision 3800d4934391b,

2010-05-28 Thread Jilles Tjoelker
On Thu, May 27, 2010 at 08:21:57PM -0400, Kris Maglione wrote:
> I'm not sure how to describe this bug, but it's affected one of my
> scripts, and those of several of my users. Basically, we've had loops
> dieing when backgrounded programs exit. This is the simplest test case
> I can come up with:

> #!/bin/dash
> {
>   echo foo
>   sleep 1
>   echo foo
>   echo done>/dev/tty
> } | while read p; do
>   ( echo good & ) &
> done
> echo done

> In versions prior to 3800d4934391b, the output would
> "good\ndone\ndone\ngood" (or some permutation thereof depending on
> system load), but from 3800d4934391b on, it's "good\ndone".

> The offending revision:

> [JOBS] Fix dowait signal race

I can reproduce this here. It happens because the child process exit's
SIGCHLD causes an EINTR on the read, which then fails.

While the above testcase works with FreeBSD sh, a similar script with a
trap on CHLD fails in both (and likely also in older dash):

#!/bin/dash
{
echo foo
sleep 1
echo foo
echo done>/dev/tty
} | { trap : CHLD; while read p; do
( echo good & ) &
done }
echo done

Most other shells do not seem to abort a read builtin for a trap. bash
and ksh93 execute the trap right away and then continue the read, while
mksh postpones it until the read is done. POSIX seems vague about it,
but a look at existing scripts suggests that traps should not abort a
read. Executing the trap right away is useful in case it does something
such as exiting, and should be safe given that traps are already invoked
from deeper levels of recursion of the interpreter. If the trap executes
a break, continue or return command outside its boundaries this should
abort the read.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] [VAR] Replace cmdenviron with localvars

2010-05-26 Thread Jilles Tjoelker
On Wed, May 26, 2010 at 09:00:34PM +1000, Herbert Xu wrote:
> [VAR] Replace cmdenviron with localvars

> This patch replaces the cmdenviron mechanism for temporary command
> variables with the localvars mechanism used by functions.

> This reduces code size, and more importantly, makes the variable
> assignment take effect immediately as required by POSIX.

This change breaks passing variable assignments to regular builtins.
For example,
  dash -c 'HOME=/ cd; pwd'
This should print /. (For some reason, this does not work in ksh93
either.)
Or
  dash -c 'cd /bin&&CDPATH=/ cd bin; pwd'
This should print /bin twice and not fail any command. (Again this fails
in ksh93.)

A more common example,
  IFS=: read -r x y z
is already broken in master before this change (due to git commit
55c46b7286f5d9f2d8291158203e2b61d2494420 which effectively replaced
bltinlookup("IFS") with ifsval()).

Similar issues may pop up with
  PATH=/foo command -V bar
  PATH=/foo type bar
  LC_NUMERIC=de_DE.UTF-8 printf '%f\n' 1,2

(Note: the latter is broken in bash, but in ksh93 and zsh it works, as
well as with an external printf(1) program.)

'local' may need additional magic to avoid making things local to the
temporary regular-builtin scope. I suppose only functions should induce
scope for 'local', such that things like command eval 'local i' continue
to create a variable local to the function.

-- 
Jilles Tjoelker
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >