Re: [PATCH 2/6] printf: Support \e in "echo" and "printf" builtins.

2018-09-11 Thread Adam Borowski
On Fri, Sep 07, 2018 at 10:34:10AM +0200, Andrej Shadura wrote:
> From: Adam Borowski 
> 
> dash's builtin version of printf doesn't support '\e' (escape), and literaly
> outputs the 2 characters as-is. As is well known, this sequence is useful,
> for example, when outputting ANSI escape sequences.
> 
> While it seems that POSIX does not require that printf support '\e',
> it is still worth implementing '\e' support, as a way to make the environment
> more consistent and avoid this small issue when porting scripts from certain
> other systems or when migrating from bash to dash.
> 
> Signed-off-by: Adam Borowski 
> [reworded the patch description]
> Signed-off-by: Andrej Shadura 
> Bug-Debian: http://bugs.debian.org/816295

Note that the patch description is lacking (I did not provide one in the
Debian package), let me elaborate.

First, including the patch is somewhat my fault.  \e was discussed before:
https://www.mail-archive.com/dash@vger.kernel.org/msg00881.html with the
patch not taken; when fixing an unrelated problem later in Jan 2017 I looked
at current code base (the Debian package was far behind) and noticed
something that _appeared_ to be \e.  Turns out I did misread that, and in my
excitement uploaded with this patch.

Wrt POSIX: it can't require \e as it believes in an abstract "portable
character set" which doesn't include ESCAPE.  On the other hand, the two
sets in use: ASCII and EBCDIC both have ESCAPE -- both at 27 although
they disagree whether dec or hex.

On the other hand, support for \e is ubiquitous (not listed = I didn't
check):
* Unix C compilers: gcc clang tcc (not MSVC)
* scripting languages: perl python ruby lua php
* other shells: bash mksh sash posh ksh busybox zsh
* Debian/Ubuntu/Mint/Devuan/Raspbian/... dash

Most remarkably, /bin/sh is dash on Debian derivatives (ie, with this patch)
and something else elsewhere (bash, mksh, ksh or busybox).  Thus, scripts do
expect \e to be supported.

> ---
>  src/bltin/printf.c | 1 +
>  src/dash.1 | 4 
>  2 files changed, 5 insertions(+)
> 
> diff --git a/src/bltin/printf.c b/src/bltin/printf.c
> index 7785735..9b878da 100644
> --- a/src/bltin/printf.c
> +++ b/src/bltin/printf.c
> @@ -340,6 +340,7 @@ conv_escape(char *str, int *conv_ch)
>   case '\\':  value = '\\';   break;  /* backslash */
>   case 'a':   value = '\a';   break;  /* alert */
>   case 'b':   value = '\b';   break;  /* backspace */
> + case 'e':   value = '\e';   break;  /* escape */
>   case 'f':   value = '\f';   break;  /* form-feed */
>   case 'n':   value = '\n';   break;  /* newline */
>   case 'r':   value = '\r';   break;  /* carriage-return */
> diff --git a/src/dash.1 b/src/dash.1
> index 32f6ac0..b286f79 100644
> --- a/src/dash.1
> +++ b/src/dash.1
> @@ -1201,6 +1201,8 @@ Subsequent output is suppressed.  This is normally used 
> at the end of the
>  last argument to suppress the trailing newline that
>  .Ic echo
>  would otherwise output.
> +.It Li \ee
> +Outputs an escape character (ESC).
>  .It Li \ef
>  Output a form feed.
>  .It Li \en
> @@ -1570,6 +1572,8 @@ The characters and their meanings are as follows:
>  Write a \*[Lt]bell\*[Gt] character.
>  .It Cm \eb
>  Write a \*[Lt]backspace\*[Gt] character.
> +.It Cm \ee
> +Write an \*[Lt]escape\*[Gt] (ESC) character.
>  .It Cm \ef
>  Write a \*[Lt]form-feed\*[Gt] character.
>  .It Cm \en
> -- 
> 2.17.1


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢠⠒⠀⣿⡁ A dumb species has no way to open a tuna can.
⢿⡄⠘⠷⠚⠋⠀ A smart species invents a can opener.
⠈⠳⣄ A master species delegates.


Re: [PATCH 1/6] exec: Don't execute binary files if execve() returned ENOEXEC.

2018-09-11 Thread Jilles Tjoelker
On Fri, Sep 07, 2018 at 10:34:09AM +0200, Andrej Shadura wrote:
> From: Adam Borowski 

> Both "dash -c foo" and "./foo" are supposed to be able to run hashbang-less
> scripts, but attempts to execute common binary files tend to be nasty:
> especially both ELF and PE tend to make dash create a bunch of files with
> unprintable names, that in turn confuse some tools up to causing data loss.

> Thus, let's read the first line and see if it looks like text.  This is a
> variant of the approach used by bash and zsh; mksh instead checks for
> signatures of a bunch of common file types.

> POSIX says: "If the executable file is not a text file, the shell may bypass
> this command execution."

I think this is worth the extra code, even though POSIX does not require
it.

Neither the file_is_binary function nor its caller take care of
preserving errno, which might lead to confusing error messages.

On another note, this is not fully POSIX-compliant because it rejects
some files that comply to POSIX's definition of a text file. Per POSIX's
definition, only files containing '\0' can be rejected (or containing
too long lines or not ending with a newline, both of which seem like a
bad idea to check); this probably requires checking more than the first
line for decent detection. However, I suppose the benefits and
disadvantages were weighed here and non-compliance was decided.

> Signed-off-by: Adam Borowski 
> Signed-off-by: Andrej Shadura 
> ---
>  src/exec.c | 32 
>  1 file changed, 32 insertions(+)
> 
> diff --git a/src/exec.c b/src/exec.c
> index 9d0215a..631 100644
> --- a/src/exec.c
> +++ b/src/exec.c
> @@ -148,6 +148,36 @@ shellexec(char **argv, const char *path, int idx)
>  }
>  
>  
> +/*
> + * Check if an executable that just failed with ENOEXEC shouldn't be
> + * considered a script (wrong-arch ELF/PE, junk accidentally set +x, etc).
> + * We check only the first line to allow binaries encapsulated in a shell
> + * script without proper quoting.  The first line, if not a hashbang, is
> + * likely to contain comments; even ancient encodings, at least popular
> + * ones, don't use 0x7f nor values below 0x1f other than whitespace (\t,
> + * \n, \v, \f, \r), ISO/IEC 2022 can have SI, SO and \e.
> + */
> +STATIC int file_is_binary(const char *cmd)
> +{
> + char buf[128];
> + int fd = open(cmd, O_RDONLY|O_NOCTTY);
> + if (fd == -1)
> + return 1;
> + int len = read(fd, buf, sizeof(buf));
> + close(fd);
> + for (int i = 0; i < len; ++i) {
> + char c = buf[i];
> + if (c >= 0 && c <= 8 ||
> + c >= 16 && c <= 31 && c != 27 ||
> + c == 0x7f)
> + return 1;
> + if (c == '\n')
> + return 0;
> + }
> + return 0;
> +}
> +
> +
>  STATIC void
>  tryexec(char *cmd, char **argv, char **envp)
>  {
> @@ -162,6 +192,8 @@ repeat:
>   execve(cmd, argv, envp);
>  #endif
>   if (cmd != path_bshell && errno == ENOEXEC) {
> + if (file_is_binary(cmd))
> + return;
>   *argv-- = cmd;
>   *argv = cmd = path_bshell;
>   goto repeat;

-- 
Jilles Tjoelker