Bug#759230: shell: Always use explicit large file API

2020-05-09 Thread Jilles Tjoelker
On Thu, May 07, 2020 at 11:42:12PM +1000, Herbert Xu wrote:
> There are some remaining stat/readdir calls in dash that may lead
> to spurious EOVERFLOW errors on 32-bit platforms.  This patch changes
> them (as well as open(2)) to use the explicit large file API.

> Reported-by: Tatsuki Sugiura 
> Signed-off-by: Herbert Xu 

> diff --git a/configure.ac b/configure.ac
> index 5dab5aa..dbd97d8 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -144,8 +144,13 @@ AC_CHECK_FUNC(stat64,, [
>   AC_DEFINE(fstat64, fstat, [64-bit operations are the same as 32-bit])
>   AC_DEFINE(lstat64, lstat, [64-bit operations are the same as 32-bit])
>   AC_DEFINE(stat64, stat, [64-bit operations are the same as 32-bit])
> + AC_DEFINE(readdir64, readdir,
> +   [64-bit operations are the same as 32-bit])
> + AC_DEFINE(dirent64, dirent,
> +   [64-bit operations are the same as 32-bit])
>  ])
> [snip]

Is it possible to use AC_SYS_LARGEFILE and the normal
fstat/lstat/readdir/dirent/open/etc instead of the non-standard "64"
interfaces?

I understand that dash formerly used the "64" interfaces selectively, so
that the binary could be a bit smaller by avoiding 64-bit numbers where
they were not necessary. Now that the feature "supports files with inode
numbers that do not fit in 32 bits" is considered essential, this
complexity seems unnecessary.

I'm sorry for not providing a patch.

-- 
Jilles Tjoelker



Bug#953421: dash: Resident Set Size growth is unbound (memory leak) on an infinite shell loop

2020-03-29 Thread Jilles Tjoelker
On Sun, Mar 29, 2020 at 08:06:31PM +0100, Harald van Dijk wrote:
> On 29/03/2020 18:54, Vitaly Zuevsky wrote:
> > I have now fixed this bug locally.

> > The leak is in jobtab array (jobs.c). I concluded that the most
> > logical approach would be eliminating inconsistency between
> > makejob() and dowait() functions. My fix in a forked repo:

> > https://salsa.debian.org/psvz-guest/dash/-/commit/5e3ea90cb3355d1308c482661a471883d36af5e7

> This change is incorrect. The reason dash keeps on allocating memory is
> because dash needs to keep on allocating memory. Consider this script:

>   set -- $(seq 1 100)
>   for i
>   do
> : &
> sleep .1
>   done
>   for i
>   do
> wait %$i
>   done

> This is a valid script and works fine in dash. Your change breaks this by
> not keeping the jobs around long enough, and I hope this test script shows
> that there is no way to keep the jobs around long enough but by allocating
> ever more memory.

I agree that the change is incorrect, but I do not agree that this kind
of script must leak memory. Per POSIX.1-2008 XCU 2.9.3.1 Asynchronous
Lists, an implementation has additional ways to forget about jobs than
just an appropriate completion of the wait utility: if another
asynchronous job is started when $! was not referenced or if the number
of known process IDs would exceed {CHILD_MAX} (which tends to be rather
big, though).

POSIX does not seem to expect using % in scripts like this; it
seems highly fragile to me anyway (although $! has problems with process
ID reuse).

FreeBSD sh implements forgetting when $! was not referenced (and the job
has terminated), but not the {CHILD_MAX} limit. This avoids the
increasing memory usage in the example script.

> Your change makes it impossible to keep track of the background process's
> status, but if you do not care about that anyway, you can avoid the
> increasing memory use without modifying dash by launching a background
> process without including it in the current shell's job table, by launching
> it from a subshell:

>   while true
>   do
> (true &)
> sleep .1
>   done

Certainly a good idea.

Another option may be to include regular invocations of the wait utility
without parameters, although this is not suitable for all scripts.

-- 
Jilles Tjoelker



Bug#788979: dash: "Argument list too long" error on not too long argument

2015-08-15 Thread Jilles Tjoelker
On Sat, Aug 15, 2015 at 02:13:22AM +0200, Vincent Lefevre wrote:
> On 2015-08-14 20:16:27 +0200, Jilles Tjoelker wrote:
> > This is not a bug in dash but either in the Linux kernel or in glibc.

> > The existence of MAX_ARG_STRLEN (see execve(2)) could be considered a
> > bug, since POSIX does not provide a hint that a limit on the length of
> > an individual argument may exist.

> Actually such a limitation seems to be allowed by POSIX:

>   [E2BIG]
> Argument list too long. The sum of the number of bytes used by
> the new process image's argument list and environment list is
> greater than the system-imposed limit of {ARG_MAX} bytes.
> or:
> Lack of space in an output buffer.
> or:
> Argument is greater than the system-imposed maximum.

> This is the third case, but then, the error message is incorrect
> ("Argument list too long" corresponds to the first case). A glibc
> bug in this case? There's the same message with bash and zsh.

I think that condition is intended for different functions, since it is
mentioned in XSH 2.3 Error Numbers and not in the exec page in XSH 3
System Interfaces.

Although POSIX allows implementation-specific error conditions, they may
be a bad idea because they hinder portability.

> > However, it seems unlikely that this bug will be fixed given that it
> > has existed for so long already and was deliberately introduced.

> Perhaps it made sense 10 years ago, but nowadays this limit is
> rather ridiculous (makes me think of Microsoft's 640 KB limit).

I think this was that kind of limit, yes, although it introduced
dependency on gmake smarts years ago (in Linux, it's common for UTILITY
MANY_ARGS to be acceptable but sh -c 'UTILITY MANY_ARGS' to fail with
[E2BIG]).

> > Alternatively, the bug is an incorrect return value from
> > sysconf(_SC_ARG_MAX). Given the existence of MAX_ARG_STRLEN, the value
> > returned should never be more than MAX_ARG_STRLEN (131072 on most
> > architectures), since that is the longest total length of arguments that
> > is guaranteed to avoid [E2BIG] errors.

> Since POSIX allows such a system limit in addition to the total
> size, I'd say that the sysconf value is correct, but the error
> message is not the correct one (see above).

> > The [E2BIG] from "/bin/true" by itself seems strange and does not match
> > the bug's descriptive text.

> This problem seems to be specific to dash.

In interactive mode outside functions, dash sets the variable _ to the
last argument after executing a simple command, but normally does not
export it. Both bash and zsh export the variable, though, so when dash
is started from bash or zsh, each command gets the last argument of the
previous command in its environment. This may cause [E2BIG] errors.

Bash and zsh set _ before running the command, so the program gets its
own last argument and it is unlikely [E2BIG] will occur that would not
have occurred without _.

-- 
Jilles Tjoelker



Bug#759230: dash: glob fails on large volume (64bit inodes) with i386 binary.

2014-09-28 Thread Jilles Tjoelker
On Mon, 25 Aug 2014 22:32:00 +0900 Tatsuki Sugiura  wrote:
> Package: dash
> Version: 0.5.7-4
> Severity: normal

> When I create i386 chroot environment on filesystem that
> have 64bit inode (large xfs/btrfs volume or xfs on 64bit kernel),
> dash glob may not work properly by useing 32-bit APIs.

> By this reason, pbuilder may fail when building i386
> package on amd64 host with xfs/btrfs volume.

> I hope to work i386 chroot on amd64 if it's possible.

Dash is careful to use largefile APIs only when needed. However,
non-largefile stat()/lstat()/fstat()/readdir() are always wrong because
of 64-bit inode numbers. Therefore, the only correct use of
non-largefile APIs are the opens of scripts, /dev/tty and /dev/null (and
maybe not even the latter, if /dev/null has a file offset). Is it still
worth the complexity to use largefile only selectively?

-- 
Jilles Tjoelker


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#753674: [dash] dash aborts a script, when 'shift' tested by 'while' fails

2014-09-28 Thread Jilles Tjoelker
On Fri, 04 Jul 2014 11:07:28 +0200 Andre Naujoks  wrote:
> I just noticed Bug #627856, which deals with a very similar issue.

> Is a shift without an argument also considered a syntax error if it is
> called too often? If yes, then this bug can be closed and I am sorry
> for the noise.

I'd say it's the same thing: "shift" is the same as "shift 1".

> IMHO this should not be considered a syntax error, because it is
> (again IMHO) a valid way of getting all arguments of a script or shell
> function.

Your code

] echo $1
] while shift; do
] echo $1
] done

looks a bit strange. It only works properly if there is at least one
argument to start with, and duplicates the handling for the first
argument (although you can fix that by placing that piece of code
directly after 'while'). It would be normal in Perl where 'shift'
returns the value shifted out but the shell's 'shift' does not return
that.

It looks more normal to write code like:

] while [ "$#" -gt 0 ]; do
] echo $1
] shift
] done

or

] for arg do
] echo $arg
] done

and then there is no issue with shift failing.

-- 
Jilles Tjoelker


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#755446: dash: Should pass SA_RESTART to sigaction

2014-09-28 Thread Jilles Tjoelker
On Mon, 21 Jul 2014 00:19:33 +0200 Samuel Thibault  wrote:
> Package: dash
> Version: 0.5.7-4
> Severity: normal
> User: debian-h...@lists.debian.org
> Usertags: hurd

> We have been having issues on hurd-i386 with the dash shell, we often
> get "cannot create /dev/null: Interrupted system call" errors. With
> research from Svante Signell, we have found that this is actually due
> to a signal which unfortunately interrupts the open() call used by
> dash to redirect some output. dash indeed does not set the SA_RESTART
> flag when calling sigaction for SIGINT or SIGCHLD. There can probably
> be other calls in dash that could get interrupted too. dash seems to
> take care of at least some of them by handling EINTR, but it's much
> safer to simply let the system handle all of them by setting
> SA_RESTART. The attached patch does this, and fixes the issue on
> hurd-i386, could you consider applying it?

This breaks interrupting blocking redirection opens with ^C. Although
relying on [EINTR] like this leaves a race window (where SIGINT arrives
between the last check and the open blocking), if the open has blocked
"long enough", SIGINT will make it fail with [EINTR]. In most cases,
dash avoids this race by longjmp from the signal handler but for a
blocking open that approach may leak the file descriptor.

The cause of the regression is the SIGCHLD handler which is always
enabled even if there is no SIGCHLD trap. It will probably help to use
SA_RESTART for the SIGCHLD handler if there is no trap (therefore,
setting a trap should disable SA_RESTART). In FreeBSD sh, another ash
derivative, I have been even more careful, only enabling the SIGCHLD
handler while it is needed (during a wait builtin).

Apart from that, my FreeBSD/Linux-centric view says that opening
/dev/null should not fail with [EINTR], just like a read or write
from/to a regular file on a "normal" filesystem (an intr NFS mount is
not normal).

-- 
Jilles Tjoelker


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#745653: dash does not squash // to / when doing `cd //`

2014-05-17 Thread Jilles Tjoelker
In Debian bug #745653, you wrote:
> If I envoke cd in dash via:

> cd //

> it incorrectly reports my cwd at // instead of /

POSIX permits this (for Cygwin and the like that want to have special
pathnames starting with //). Three or more slashes are equivalent to
one.

You might want to use cd's -P option to force resolution of symlinks;
this will also resolve // down to / if they are equivalent.

-- 
Jilles Tjoelker


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#746186: dash: Using wildcard in redirect doesn't work

2014-05-17 Thread Jilles Tjoelker
In Debian bug #746186, you wrote:
> I was trapped by this strange bug. It seems that wildcards ("?" and
> "*") are not resolved when used in redirection.

This is not a bug, but required by the POSIX specification. Per POSIX, a
non-interactive shell shall not generate pathnames in redirections; an
interactive shell may, provided exactly one pathname results.

Dash chooses the simplest thing that can be compliant here: it never
generates pathnames in redirections.

You can see the same behaviour in scripts with bash --posix or ksh93.

-- 
Jilles Tjoelker


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#733852: dash: spurious "> " output after Ctrl-V Ctrl-J

2014-03-01 Thread Jilles Tjoelker
In Debian bug #733852, Vincent Lefevre wrote:
> When one types Ctrl-V Ctrl-J (to make a ^J control character appear
> on the command line), dash outputs a spurious "> " string (one by
> ^J occurrence) once the command is validated. For instance:

> $ echo "ab^Jcd^Jef"
> > > ab
> cd
> ef
> $  true "ab^Jcd^Jef"
> > > $

> where the ^J has been entered with Ctrl-V Ctrl-J.

I think this is a normal consequence of using the kernel line editing
("canonical mode", ICANON) instead of something like libedit or
libreadline. In the sequence of bytes returned from read(), Ctrl+V
Ctrl+J looks the same as .

One difference is that the kernel returns (parts of) one line in a
single read() call, which could be used to distinguish between the two
kinds of newline. This is not fully reliable because a newline at the
last byte of a read call that fills the buffer completely cannot be
analyzed this way; both Linux and FreeBSD permit longer lines than their
advertised {MAX_CANON} so the problem cannot be avoided with a
{MAX_CANON}+1 buffer.

I expect that the extra ugliness caused by this incomplete check will
not be worth it. It would be better to enable libedit or to use a
version of linenoise (link it statically to avoid the impact on fork()
performance).

By the way, you can also have a "spurious" "$ ":
$ echo a^Jecho b
a
$ b
$ 

-- 
Jilles Tjoelker


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#726735: dash: redirect stdin from stdin to override asynchronous list redirection fails

2013-12-08 Thread Jilles Tjoelker
Hi Serge, ltns,

In Debian bug #726735, you wrote:
> Dash redirects stdin from /dev/null when a command is started
> asynchronously, as specified by POSIX. However, POSIX also specifies
> that this can be overridden by explicitely redirecting. This fails
> when redirecting stdin from stdin.

> The relevant text from POSIX (
> http://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_09_03_02
> ):
> ] The standard input for an asynchronous list, before any explicit
> ] redirections are performed, shall be considered to be assigned to a file
> ] that has the same properties as /dev/null. If it is an interactive
> ] shell, this need not happen. In all cases, explicit redirection of
> ] standard input shall override this activity.

> To test: create the following script:
> #!/bin/dash
> sleep 100 <&0 &

> Run this, and then look at the open file descriptors of the sleep
> process:
> ls -l /proc/`pgrep sleep`/fd
> The output is as follows:
> lr-x-- 1 svdb svdb 64 Oct 18 15:58 0 -> /dev/null
> lrwx-- 1 svdb svdb 64 Oct 18 15:58 1 -> /dev/pts/2
> lrwx-- 1 svdb svdb 64 Oct 18 15:58 2 -> /dev/pts/2
> fd 0 is assigned to /dev/null instead of /dev/pts/2

In the example script, the redirection happens in the child process.
Therefore, the shell cannot, in the general case, know about this
redirection when it forks the child process. For some reason, bash still
overrides the implicit 

Bug#716789: Bug #716789: dash: sometimes misbehaves when backticks are included after an || operator

2013-07-20 Thread Jilles Tjoelker
In Debian bug #716789, you wrote:
> [cat <

Bug#713805: Bug #713805: dash: does not pass/recognize parameters to/in sourced script

2013-06-22 Thread Jilles Tjoelker
In Debian bug #713805, you wrote:
> [dot ignores parameters after the script pathname]

POSIX does not specify what happens if dot is given more than one
parameter, and dash simply ignores the excess. I think it is unlikely
that dash will add the extension of temporarily setting the excess (if
not empty) as the positional parameters.

You can set the positional parameters using 'set -- ' before
sourcing the script, or you can create a function that sources the
script and gets passed the desired parameters.

-- 
Jilles Tjoelker


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#700232: Bug #700232: dash: Segmentation fault in here-document and syntax error

2013-03-10 Thread Jilles Tjoelker
In bug #700232, you wrote:
> dash -c ': <

Bug#690473: Bug #690473: dash: No diagnostic on echo builtin failure

2013-02-27 Thread Jilles Tjoelker
Hi,

Patching bltin/printf.c to generate an error message does not seem to
make much sense given that the non-zero exit status is set by eval.c.
Why can the error message not be generated there as well? This is also
what I do in FreeBSD sh (and I return exit status 2 instead of 1 in this
situation).

Note that echo and printf are not the only builtins that output things
to stdout and can encounter errors because of that. Some shells ignore
this fact, pretending it is OK if builtins like jobs, set, times and
trap cannot write their output.

ksh93 and mksh appear not to print an error message either.

-- 
Jilles Tjoelker


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#589103: lsof on kFreeBSD

2012-05-27 Thread Jilles Tjoelker
> [lsof on debian kfreebsd]

I don't think it makes much sense to extend linprocfs to be a full Linux
/proc, just to be able to run monitoring tools. This would only be
useful for Debian/kFreeBSD and not for real FreeBSD.

Instead, lsof should obtain its data via sysctl, either directly or via
libprocstat (the library is new in 9.0 and the 9.0 kernel also contains
additions to the sysctl interface needed by libprocstat).

These interfaces are used by 'procstat -f', 'sockstat', 'fstat' and
'fuser' which together provide functionality similar to lsof. (In 8.x
and earlier, fstat needs kmem access but not in 9.0 and later.)

As with the move on Linux from kmem to /proc many years ago, some very
specialized information may have to be given up, but that's worth it.

-- 
Jilles Tjoelker



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#651446: dash builtin read has no -s option for silent

2011-12-18 Thread Jilles Tjoelker
> [dash's read does not have -s to turn off echo]

You can turn off echoing using a sequence like

settings=$(stty -g)
stty -echo
IFS= read -r password
stty "$settings"

Some additional code is required to restore the terminal settings even
if SIGINT, SIGQUIT, SIGTERM or similar happens.

-- 
Jilles Tjoelker



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#590920: dash does not support multi-digit file descriptors

2011-11-01 Thread Jilles Tjoelker
> [possibly allowing fd>9 in redirections]

Removing the limit of 10 file descriptors accessible to redirections is
not as easy as it seems.

An easy change is to raise the limit from 10 to, say, 24, like mksh has
done. Although this allows access to more file descriptors, it also
increases the 'ulimit -n' necessary to run the shell and the new limit
is still pretty low (not high enough for the broken example in the old
flock man page, for example). File descriptors above this limit are used
(via fcntl(F_DUPFD)) for reading shell scripts, preserving open files
that need to be restored after a redirected command and all other
internal purposes that last longer than a command.

What bash does is to remove the fixed boundary. When the user tries to
overwrite a file descriptor the shell uses internally, it will move that
descriptor to another number before allowing the operation to proceed.
Implementing this will increase code size and complexity.

In ksh93, file descriptors before a redirection operators are limited to
one digit but a special syntax allows accessing more descriptors. A
command like
  exec {in}

Bug#642922: regression: "sh -c" change causes FTBFS

2011-10-09 Thread Jilles Tjoelker
> > POSIX's Shell and Utilities (XCU) 2.12 [1] does say that "[the]
> > environment of the shell process shall not be changed by the utility",
> > and that environment includes open files. My understanding is that
> > dash's new behaviour (and incidentally, ksh93's one) is incorrect.

> As I understand it, the intent of that section was not to prohibit
> changes in process state due to a command like "exec foo" replacing
> the shell process.  The shell execution environment is not changed but
> simply no longer exists once execve() has been called.

The word "process" seems a bit awkward here, yes. For the most part,
POSIX is very careful in writing "shell environment" instead of "shell
process", so that creating a new process can often be avoided by clever
implementations.

Anyway, if the shell has found that it can safely exec a simple command
without forking (or run a subshell without forking), there is no
possibility of any execution in the original shell environment, so there
seems no need to preserve any open files or other resource needed for
such execution.

Note that 2.7 Redirection does not say anything explicit about
preserving the previous content of file descriptors, only that a
redirection on a command (except 'exec') shall be in effect during that
command only. Therefore, the shell only needs to save file descriptors
at high numbers if it may need to execute subsequent commands in the
same process ("may" because a trap for EXIT might be set in the command,
if it is not external).

Also note that 2.9.1 Simple Commands does not explicitly call for
creating a new process. If there are subsequent commands, the shell will
obviously need to create a new process, but otherwise that is not
needed.

> More to the point, POSIX is very easy to change[1], so if this
> behavior is causing a problem in otherwise reasonable application
> code, methinks it would be better to discuss that instead...

The behaviour you are complaining about exists in almost all shells
(tried: FreeBSD /bin/sh, dash, bash, zsh, mksh, ksh93, Heirloom Bourne
shell, kBuild kmk_ash (much like NetBSD /bin/sh)) in some contexts, so
it is unlikely to change.

To demonstrate, where ${SH} is the shell to be tested:

${SH} -c '/bin/sleep 10 >/dev/null &' | { cat; echo EOF; }

The "EOF" line appears immediately in all tested shells, showing that
there is nothing holding onto the pipe in the subshell environment
created by '&'.

On the other hand, if the redirection is 2>/dev/null instead, it takes
ten seconds before "EOF" appears, as in this case the pipe file
descriptor is passed on.

If braces or parentheses are added around the sleep command (including
or not including the redirection), behaviour varies per shell.

Likewise, behaviour for

{ ${SH} -c '/bin/sleep 10 >/dev/null' & } | { cat; echo EOF; }

varies per shell, and I do not see a reason why this should keep the
pipe file descriptor open while shells clearly agree that it should not
be kept open in the first example.

(Note that I write /bin/sleep explicitly, because some shells have a
sleep builtin.)

-- 
Jilles Tjoelker



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#619725: dash: permit special builtin names as function names

2011-04-03 Thread Jilles Tjoelker
> [false || local() { ...; } is a syntax error]

Although I don't think special builtins shall be valid function names, I
think there is still a bug in dash here: 'local' should not be a special
builtin.

When 'local' was made a special builtin, I remarked on the dash mailing
list that that change may not be a good idea:
http://www.mail-archive.com/dash@vger.kernel.org/msg00268.html

I think this is a concrete reason why I was right. 'local' should be a
regular builtin and should be made to work in another way, such as by:
* adding a special case to evalcommand() for 'local' (if (spclbltin > 0
  || argc == 0)), or
* modifying localcmd() so only local variables are only added to
  localvars lists belonging to functions.

Although I have not tried, the former looks easiest.

-- 
Jilles Tjoelker



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#595063: read() builtin doesnt read integer value /proc files (but bashs does)

2010-12-18 Thread Jilles Tjoelker
On Wed, Dec 15, 2010 at 12:55:51PM -0600, Jonathan Nieder wrote:
> Cristian Ionescu-Idbohrn wrote:
> > On Sun, 28 Nov 2010, Herbert Xu wrote:
> > > On Sat, Sep 04, 2010 at 07:35:04PM +0000, Jilles Tjoelker wrote:

> >>> This discarding is still bad as it throws away valid data if the open
> >>> file description is shared. This happens if stdin is redirected inside a

> >> I'm with Jilles on this.  I also don't particularly feel like
> >> bloating dash just because of the borked /proc interface when
> >> there is a perfectly adequate work-around in "cat".

> >>value=$(cat /proc/file)

> > I wouldn't call that "a perfectly adequate work-around", but a painful and
> > unadequate work-around.

> For what it's worth, here's what bash does (based on strace):

> 1. Determine whether the file is seekable.  That is, seek using
> SEEK_CUR with offset 0.

> 2. If seekable, read a nice big chunk and then seek back to put the
> file offset back in the right place.  If not seekable, read one byte
> at a time.

> This works in /proc because files in /proc are seekable.

> That said, I don't think borked /proc is a great reason to do this
> (it's a better reason to fix /proc).  Speeding up the read builtin
> might be a good reason.

The optimization is of limited benefit (still way more syscalls than
strictly necessary) and does not apply to the common use case of reading
from a pipe. Generally, if 'read' is too slow, it is better to spend a
fork on a tool like grep, sed or awk which processes large amounts of
text much more efficiently.

As for value=$(cat /proc/file), there are at least two ways to make this
faster. The traditional ksh way is the extension value=$(

Bug#436466: dash: Please optimise single command given to -c to exec it

2010-12-07 Thread Jilles Tjoelker
Drake Wilson wrote:
> - Point out that full tail execution optimization is not as
>   localized as the "anything without shell meta-stuff" that some
>   people think it is, and that partial tail execution optimization
>   can lead to subtle problems if not done carefully.  Cf. a vaguely
>   similar case where I was attempting to use an XSI shell builtin
>   (which I was willing to rely on on the target system) as a single
>   command in a series of commands in a makefile, and GNU Make
>   decided that it didn't look enough like a shell thing and could be
>   "optimized" into an exec, which naturally hosed the command.  Of
>   course, the case of doing exec transformation _in_ the shell is
>   not as bad as that.

I think the gmake problem is caused by a non-compliance to POSIX in the
operating system, possibly combined with an overzealous skipping of the
shell by gmake. It is rather different from the fork/exec problem with
sh -c.

POSIX requires that all standard utilities except special builtins be
available via the exec functions and the utilities that can invoke
user-specified other utilities (env, find, nice, nohup, time, xargs).

In SUSv4, the special builtins are . (dot), : (colon), break, continue,
eval, exec, exit, export, readonly, return, set, shift, times, trap and
unset; none of these are under the XSI option. An implementation is not
required to provide them as external programs (and there seems no point
in doing so). Gmake must know about these and not try to exec them
itself.

Other builtins that do not exist as external programs yet need to be
provided. An easy way is to link a shell script like the below to
/usr/bin/alias, /usr/bin/bg, /usr/bin/cd, /usr/bin/command, etc.:

#!/bin/sh
command -p ${0##*/} "$@"

The 'command -p' serves to avoid looping when the PATH is set to a value
like /usr/bin:%builtin which is an undocumented way to prefer
/usr/bin/FOO to a builtin command FOO. The 'command' utility itself is
not subject to this because dash implements the POSIX requirement that
various (usually builtin) utilities such as 'command' cannot be
overridden via PATH (but unlike the special builtins can be overridden
with a function).

Regarding sh -c optimization, I am in favour of this. Uselessly waiting
'sh -c' processes annoy me. I made the change for FreeBSD 8.0 sh, which
is very similar to dash. The SVN changeset is r194128. The change
appears to work for dash as well.

-- 
Jilles Tjoelker



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#591321: error: "value too great for base" while comparing numbers more than 7 with leading zeros

2010-11-07 Thread Jilles Tjoelker
Alexey Knyazev wrote:
> k...@fish-eye:~$ if (( 06 > 06 )); then echo lazhaa; fi
> k...@fish-eye:~$ if (( 07 > 06 )); then echo lazhaa; fi
> lazhaa
> k...@fish-eye:~$ if (( 07 > 07 )); then echo lazhaa; fi
> k...@fish-eye:~$ if (( 08 > 07 )); then echo lazhaa; fi
> bash: ((: 08: value too great for base (error token is "08")
> k...@fish-eye:~$ if (( 09 > 08 )); then echo lazhaa; fi
> bash: ((: 09: value too great for base (error token is "09")
> k...@fish-eye:~$

If you accept that the expressions in the non-standard construct ((FOO))
work the same as the ones in the standard construct $((FOO)), which
seems reasonable, then this is perfectly correct behaviour. Arithmetic
expressions recognize hexadecimal (starting with 0x), octal (starting
with 0) or decimal (starting with 1-9) numbers. Another example: the
following expression is true:
  (( 8 == 010 ))

The numerical comparisons -lt, -le, -eq, -ne, -ge, -gt in test and [
recognize only decimal numbers. Therefore 09 is valid and nine and 010
is ten and not eight.

I will admit that this is messy, but I think changing it now will only
create a bigger mess. POSIX says where octal numbers shall and shall not
be recognized and software generally complies to that.

-- 
Jilles Tjoelker



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#566641: bash: wait $pid exits before $pid has terminated

2010-11-07 Thread Jilles Tjoelker
This looks rather similar to #566639 and can be closed for the same
reason. The behaviour is as specified by POSIX and does not seem broken
to me.

-- 
Jilles Tjoelker



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#482194: bash: PS1 non-empty for non-interactive shells

2010-11-05 Thread Jilles Tjoelker
Some shells indeed seem to not set PS1 for non-interactive shells.
However, it seems bad to rely on this to detect non-interactive shells.
I don't see language in POSIX that says PS1 should not be set for
non-interactive shells. Some people export PS1 in their shell startup
files, breaking the check. The correct way to detect interactive vs
non-interactive shells is to check if $- contains "i", for example:

  case $- in
  *i*) echo I am interactive ;;
  *)   echo I am not interactive ;;
  esac

-- 
Jilles Tjoelker



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#598239: dash local assignment splits words

2010-10-03 Thread Jilles Tjoelker
Stephen Gildea wrote:
> Using the "local" command to do variable assignment should not do word
> splitting.

Please note that this feature was in dash for some time, but was then
removed.

See Changelog.O in the dash source tree:
On Sun, 23 Nov 1997 12:45:15 +1100, ash 0.3.1-14:
  * Disabled word-splitting for assignment builtins.
On Mon, 24 Nov 1997 16:19:10 +1100:
  * Fixed incorrect assignment builtin code.
On Sun, 15 Jul 2001 17:27:03 +1000, ash 0.3.8-15:
  * Removed assignment builtins since it is at best undefined by the SuS and
also can't be implemented consistently.

Unfortunately, neither the dash git repository nor the dash mailing list
archives I can find go back that old.

When the "treat words that look like assignments specially" behaviour
triggers exactly differs between shells that implement it.

In ksh93 it is fairly consistent: all builtins it applies to are
special, so that when a word looks like such a builtin it is one. (ksh93
doesn't have "local", its "alias" is special, different from what POSIX
requires, and its extension "typeset" is also special.) The command word
may contain quoting but no expansions to be a candidate for this special
treatment.

In bash and mksh, it applies to various non-special builtins that can at
least be overridden with a function. The special behaviour does not
depend on whether the actual builtin is going to be called, or a
user-defined function. The command word must be the literal builtin, no
quoting or expansions are permitted.

Some versions of bash and zsh, configured with certain options, apply
the special casing to all words that look like assignments.

Another effect of the special casing is that tilde expansion is
performed in cases like
  export PATH=foo:~/bin:bar

I agree with the other post that POSIX does not require this behaviour,
and might even prohibit it; the prohibition is probably not intended,
though, and it seems unlikely to me the standard will be changed to
prohibit it -- that would break more than it fixes.

-- 
Jilles Tjoelker



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#595063: read() builtin doesn't read integer value /proc files (but bash's does)

2010-09-04 Thread Jilles Tjoelker
On Sat, Sep 04, 2010 at 08:20:33PM +0200, Steve Schnepp wrote:
> 2010/9/3 Jilles Tjoelker :
> > This patch assumes that the file descriptor is discarded afterwards (its
> > position does not matter). Therefore the very common construct
> >  while read x; do
> >    ...
> >  done
> > stops working.

> Ohh.. thanks for that, I didn't see it.

> Actually "while read x" continues to work.
> But "reopening the file" doesn't as in :

> read a b < datafile
> echo ${a} ${b}
> read a b < datafile
> echo ${a} ${b}

You're right, it's even stranger than I expected.

> I attached an updated patch that corrects this pb by discarding the
> buffer when opening a new file.

This discarding is still bad as it throws away valid data if the open
file description is shared. This happens if stdin is redirected inside a
while read... loop.

Furthermore, I think constructions like
  read x; cat
and
  read x; (read y); read z
should keep working. This requires that the input's file position be
synced whenever another process may see it (fork/exit). Due to the
highly dynamic character of the shell and the common use of fd 0, this
probably means that you can't do better than syncing after each read
builtin. (For example, 'read' could be overridden with a function after
the third line.)

Another thought:
  exec 3<&0; read x; read y <&3
or even
  sh -c 'read x; read y <&3' 3<&0
Different file descriptors may refer to the same open file description
and the shell may not know this.

-- 
Jilles Tjoelker



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#595063: read() builtin doesn't read integer value /proc files (but bash's does)

2010-09-03 Thread Jilles Tjoelker
On Thu, Sep 02, 2010 at 05:02:55PM +0200, Steve Schnepp wrote:
> 2010/9/1 Steve Schnepp :
> > conforming to POSIX isn't a realistic option, would it be possible to
> > have a workaround that doesn't involve an external tool like cat(1) ?

> Hi, I just hacked & attached a little patch away to be able to solve
> this case.
> Feel free to reply with your comments.

> NB: I just targeted dash-0.5.5.1, but it might apply to any version.

This patch assumes that the file descriptor is discarded afterwards (its
position does not matter). Therefore the very common construct
  while read x; do
...
  done
stops working.

A possible fix is to check first if the input supports seeking. If it
does, use the buffering and at the end of the line seek backwards for
the number of bytes remaining in the buffer. If it does not, read one
byte at a time.

-- 
Jilles Tjoelker



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#595063: read() builtin doesn't read integer value /proc files (but bash's does)

2010-09-02 Thread Jilles Tjoelker
On Wed, Sep 01, 2010 at 10:10:11AM +0200, Steve Schnepp wrote:
> Hi, I opened bug 595063 on the debian BTS [1] and I was suggested to
> resend the email upstream.

> So I copied the body of the bug below :

> dash's read() builtin seems to read the underlying file 1 char at a
> time. This doesn't work with some files under /proc, since procfs isn't
> fully POSIX compliant.

> [snip]

> After a little digging, it only appears on files that contains just an
> integer value. When asked to read with a non-null offset (*ppos != 0),
> __do_proc_dointvec() just returns 0 (meaning an EOF) as shown on [2].

> I'm aware that the issue isn't strictly a dash one, since it has the
> right to read one character at a time. But since fixing procfs to be
> conforming to POSIX isn't a realistic option, would it be possible to
> have a workaround that doesn't involve an external tool like cat(1) ?

Given that other files in /proc do work, I don't see why the ones that
only contain an integer value cannot be fixed. All the necessary state
to produce the second and further bytes is available.

Choosing a powerful abstraction like a regular file has its
implications.

Note that a change in the file between the single-byte reads will cause
an inconsistent value to be read. This is also the case with regular
files on a filesystem, so it is acceptable.

If single-byte reads are really unacceptable, then the proper way to
read these files needs to be documented, and clear violations that will
not work properly should cause an error (in this case, this means that
reading one byte from offset 0 should fail like reading one byte from
offset 1 does).

-- 
Jilles Tjoelker



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#593426: zsh: Status of background jobs not updated

2010-08-19 Thread Jilles Tjoelker
> [continuing a stopped background job using kill is not reflected in the
> output of jobs]

I think hooking into the kill builtin is the wrong way to fix this.
Instead, use the facilities provided by modern systems which can notify
you if a child process continues. These are si_code CLD_CONTINUED for
SIGCHLD and the WCONTINUED flag and WIFCONTINUED() macro for waitpid().
Some systems do not provide these, and may not even provide queuing and
siginfo for SIGCHLD, so the latter approach seems best. The WCONTINUED
stuff can then be #ifdef'ed out for systems that do not support it.

-- 
Jilles Tjoelker



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#587741: dash: shell functions seem to make per-command variable assignments permanent

2010-07-07 Thread Jilles Tjoelker
In your example
  $ dash -c 'noop(){ true;}; x=1; noop; x=broken noop; echo "x is now $x"'
  x is now broken
dash's behaviour is what POSIX requires, and the same can be seen with
bash --posix and ksh93. The relevant section is XCU 2.9.5 Function
Definition Command.

The same thing applies to special builtins such as ':', exec and unset.
For example, these two both print 'good':
  dash -c 'x=bad; x=good :; echo $x'
  dash -c 'x=good; x=bad true; echo $x'
Again bash needs --posix here.

The idea behind special-casing some of the builtins is that the special
builtins must always be builtins and are treated like how the original
Bourne shell treated all builtins, while all other utilities may be
implemented as builtins or external programs and are treated like how
the original Bourne shell treated external programs (except that some of
them affect the shell environment in a way only a builtin can).

There is less historical basis for the treatment of assignments on
function calls, as such assignments had no effect whatsoever in the
original Bourne shell.

-- 
Jilles Tjoelker



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#566639: dash: wait $pid exits before $pid has terminated

2010-06-05 Thread Jilles Tjoelker
The early return of 'wait' is described in POSIX XCU 2.11 Signals and
Error Handling: the wait builtin shall return immediately with >128 exit
status if a trap occurs.  It is somewhat hidden as the page for 'wait'
makes no mention of this. 

To avoid this, you can try to 'wait' until the exit status is less than
128. To distinguish between traps and child processes exiting because of
signals, the trap handlers will have to do more magic.

The early termination of 'sleep' happens because sh's termination causes
the process group (formerly containing sh and sleep) to become orphaned,
and at least one process (sleep) in that process group is stopped. It is
sent a SIGHUP and a SIGCONT signal. This is done because there is noone
left to care about the process, and otherwise it would remain stopped
forever. (See _Exit() in XSH and orphaned process group in XBD 3
Definitions.)

Concludingly, there is no bug here. All the behaviour is as described in
POSIX.

-- 
Jilles Tjoelker



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#570757: dash: cannot parse multiline parameter expansion

2010-05-30 Thread Jilles Tjoelker
FreeBSD's ash derivative has a similar restriction (newlines inside ${}
must be quoted via single-quotes, double-quotes or here document). I
think this is a feature as it allows detecting syntax errors where you
still have a chance to see the problem easily rather than 100 lines
later, and I have not received any other complaints about this.

The System V shell (e.g. /bin/sh on Solaris <= 10) behaves similarly.

-- 
Jilles Tjoelker



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org