Re: printf inconsistent results for %.0f

2024-09-18 Thread Stephane Chazelas
2024-08-14 09:11:05 -0400, Chet Ramey:
> On 8/13/24 7:05 PM, Grisha Levit wrote:
> > On Mon, Aug 12, 2024, 11:04 Chet Ramey  > > wrote:
> > 
> > My question is why the (admittedly old) gnulib replacement 
> > strtod/strtold
> > is messing things up.
> > 
> > 
> > Looks like printf(3) gets called with a `Lf' conversation specifier and
> > a double argument.
> 
> Yes, I came to the same conclusion with an essentially identical fix.
[...]

Would it be possible to have a 5.2 patch released with the
backport of those two fixes?

At the moment, all systems where bash is built with gcc 14 (such
as Debian trixie
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1078556) where
-Werror=incompatible-pointer-type is now the default
(https://gcc.gnu.org/gcc-14/porting_to.html#incompatible-pointer-types)
have a broken bash printf builtin.

Extracted from the "devel" branch on the git repo:

commit 55a224da44768bdcb57603a135657919cf08c2f7
Author: Chet Ramey 
Date:   Fri Nov 24 12:39:17 2023 -0500

fix for fdflags loadable builtin; new strptime loadable builtin; enable -f 
doesn't fall back to current directory if using BASH_LOADABLES_PATH; new 
operator for rl_complete_internal that just dumps possible completions

diff --git a/configure.ac b/configure.ac
index fa5f5747..c25a8088 100644
--- a/configure.ac
+++ b/configure.ac
@@ -898,7 +899,7 @@ AC_CHECK_DECLS([strtold], [
[AC_COMPILE_IFELSE(
[AC_LANG_PROGRAM(
[[#include ]],
-   [[long double r; char *foo, bar; r = strtold(foo, &bar);]]
+   [[long double r; char *foo, *bar; r = strtold(foo, &bar);]]
)],
[bash_cv_strtold_broken=no],[bash_cv_strtold_broken=yes])
 ]

commit e327891b52513bef0b34aac625c44f8fa6811f53
Author: Chet Ramey 
Date:   Wed Aug 21 16:11:01 2024 -0400

fix for printf with broken strtold; fix readline reading specified number 
of multibyte characters; fix read builtin to deal with invalid utf-8 
continuation character as delimiter; turn off -n if supplied at interactive 
shell invocation

diff --git a/builtins/printf.def b/builtins/printf.def
index 6549e718..18f3f659 100644
--- a/builtins/printf.def
+++ b/builtins/printf.def
@@ -209,11 +209,13 @@ static uintmax_t getuintmax (void);
 typedef long double floatmax_t;
 #  define USE_LONG_DOUBLE 1
 #  define FLOATMAX_CONV"L"
+#  define FLOATMAX_CONVLEN 1
 #  define strtofltmax  strtold
 #else
 typedef double floatmax_t;
 #  define USE_LONG_DOUBLE 0
 #  define FLOATMAX_CONV""
+#  define FLOATMAX_CONVLEN 0
 #  define strtofltmax  strtod
 #endif
 static double getdouble (void);
@@ -782,7 +784,7 @@ printf_builtin (WORD_LIST *list)
floatmax_t p;
 
p = getfloatmax ();
-   f = mklong (start, "L", 1);
+   f = mklong (start, FLOATMAX_CONV, FLOATMAX_CONVLEN);
PF (f, p);
  }
else/* posixly_correct */


Thanks,
Stephane



Re: bug#65659: RFC: changing printf(1) behavior on %b

2023-09-02 Thread Stephane Chazelas
2023-09-01 23:28:50 +0200, Steffen Nurpmeso via austin-group-l at The Open 
Group:
[...]
>  |FWIW, a "printf %b" github shell code search returns ~ 29k
>  |entries
>  |(https://github.com/search?q=printf+%25b+language%3AShell&type=code&l=Sh\
>  |ell)
>  |
>  |That likely returns only a small subset of the code that uses
>  |printf with %b inside the format and probably a few false
>  |positives, but that gives many examples of how printf %b is used
>  |in practice.
> 
> Actually this returns a huge amount of false positives where
> printf(1) and %b are not on the same line, let alone the same
> command, if you just scroll down a bit it starts like neovim match
[...]

You're right, I only looked at the first few results and saw
that already gave interesting ones.

Apparently, we can also search with regexps and searching for
printf.*%b
(https://github.com/search?q=%2Fprintf.*%25b%2F+language%3AShell&type=code)
It's probably a lot more accurate. It returns ~ 19k.

(still FWIW, that's still just a sample of random code on the
internet)

[...]
> Furthermore it shows a huge amount of false use cases like
> 
>  printf >&2 "%b\n" "The following warnings and non-fatal errors were 
> encountered during the installation process:"
[...]

Yes, I also see a lot of echo -e stuff that should have been
echo -E stuff (or echo alone in those (many) implementations
that don't expand by default or use the more reliable printf
with %s (not %b)).

> It seems people think you need this to get colours mostly, which
> then, it has to be said, is also practically mislead.  (To the
> best of *my* knowledge that is.)
[...]

Incidentally, ANSI terminal colour escape sequences are somewhat
connecting those two %b's as they are RGB (well BGR) in binary
(white is 7 = 0b111, red 0b001, green 0b010, blue 0b100), with:

R=0 G=1 B=1
printf '%bcyan%b\n' "\033[3$(( 2#$B$G$R ))m" '\033[m'

(with Korn-like shells, also $(( 0b$B$G$R )) in zsh though zsh
has builtin colour output support including RGB-based).

Speaking of stackexchange, on the June data dump of
unix.stackexchange.com:

stackexchange/unix.stackexchange.com$ xml2 < Posts.xml | grep -c 'printf.*%b'
494

(FWIW)

Compared with %d (though that will have entries for printf(3) as well):

stackexchange/unix.stackexchange.com$ xml2 < Posts.xml | grep -c 'printf.*%d'
3444

-- 
Stephane



Re: bug#65659: RFC: changing printf(1) behavior on %b

2023-09-01 Thread Stephane Chazelas
2023-09-01 07:54:02 -0500, Eric Blake via austin-group-l at The Open Group:
[...]
> > Well in all case %b can not change semantic in the bash script, since it is
> > there for so long, even if it depart from python, perl, libc, it is
> > unfortunate but that's the way it is, nobody want a semantic change, and on
> > next routers update, see the all internet falling appart :-)
> 
> How many scripts in the wild actually use %b, though?  And if there
> are such scripts, anything we can do to make it easy to do a drop-in
> replacement that still preserves the old behavior (such as changing %b
> to %#s) is going to be easier to audit than the only other
> currently-portable alternative of actually analyzing the string to see
> if it uses any octal or \c escapes that have to be re-written to
> portably function as a printf format argument.
[...]

FWIW, a "printf %b" github shell code search returns ~ 29k
entries
(https://github.com/search?q=printf+%25b+language%3AShell&type=code&l=Shell)

That likely returns only a small subset of the code that uses
printf with %b inside the format and probably a few false
positives, but that gives many examples of how printf %b is used
in practice.

printf %b is also what all serious literature about shell
scripting has been recommending to use in place of the
unportable echo -e (or XSI echo, or print without -r). That
includes the POSIX standard which has been recommending using
printf instead of the non-portable echo for 30 years.

So that change will also invalidate all those. It will take a
while before %#s is supported widely enough that %b can be
safely replaced with %#s

-- 
Stephane



Re: bug#65659: RFC: changing printf(1) behavior on %b

2023-09-01 Thread Stephane Chazelas
2023-09-01 07:15:14 -0500, Eric Blake:
[...]
> > Note that in bash, you need both
> > 
> > shopt -s xpg_echo
> > set -o posix
> > 
> > To get a XSI echo. Without the latter, options are still
> > recognised. You can get a XSI echo without those options with:
> > 
> > xsi_echo() {
> >   local IFS=' ' -
> >   set +o posix
> >   echo -e "$*\n\c"
> > }
> > 
> > The addition of those \n\c (noop) avoids arguments being treated as
> > options if they start with -.
> 
> As an extension, Bash (and Coreutils) happen to honor \c always, and
> not just for %b.  But POSIX only requires \c handling for %b.
> 
> And while Issue 8 has taken steps to allow implementations to support
> 'echo -e', it is still not standardized behavior; so your xsi_echo()
> is bash-specific (which is not necessarily a problem, as long as you
> are aware it is not portable).
[...]

Yes, none of local (from ash I believe), the posix option
(several shells have an option called posix all used to improve
POSIX conformance, bash may have been the first) nor -e (from
Research Unix v8) are standard, that part was about bash
specifically (as the thread is also posted on gnu.bash.bug).

BTW, that xsi_echo is not strictly equivalent to a XSI echo in
the case where the last character of the last argument is an unescaped
backslash or a character whose encoding ends in the same byte as
the encoding of backslash.

-- 
Stephane



Re: bug#65659: RFC: changing printf(1) behavior on %b

2023-09-01 Thread Stephane Chazelas
2023-08-31 15:02:22 -0500, Eric Blake via austin-group-l at The Open Group:
[...]
> The current POSIX says that %b was added so that on a non-XSI
> system, you could do:
> 
> my_echo() {
>   printf %b\\n "$*"
> }

That is dependant on the current value of $IFS. You'd need:

xsi_echo() (
  IFS=' '
  printf '%b\n' "$*"
)

Or the other alternatives listed at
https://unix.stackexchange.com/questions/65803/why-is-printf-better-than-echo/65819#65819

[...]
> Bash already has shopt -s xpg_echo

Note that in bash, you need both

shopt -s xpg_echo
set -o posix

To get a XSI echo. Without the latter, options are still
recognised. You can get a XSI echo without those options with:

xsi_echo() {
  local IFS=' ' -
  set +o posix
  echo -e "$*\n\c"
}

The addition of those \n\c (noop) avoids arguments being treated as
options if they start with -.


[...]
> The Austin Group also felt that standardizing bash's behavior of %q/%Q
> for outputting quoted text, while too late for Issue 8, has a good
> chance of success, even though C says %q is reserved for
> standardization by C. Our reasoning there is that lots of libc over
> the years have used %qi as a synonym for %lli, and C would be foolish
> to burn %q for anything that does not match those semantics at the C
> language level; which means it will likely never be claimed by C and
> thus free for use by shell in the way that bash has already done.
[...]

Note that %q is from ksh93, not bash and is not portable across
implementations and with most including bash's gives an output
that is not safe for reinput in arbitrary locales (as it uses
$'...' in some cases), not sure  it's a good idea to add it to
the standard, or at least it should come with fat warnings about
the risk in using it.

See also:

https://unix.stackexchange.com/questions/379181/escape-a-variable-for-use-as-content-of-another-script/600214#600214

-- 
Stephane



Re: RFC: changing printf(1) behavior on %b

2023-09-01 Thread Stephane Chazelas
2023-09-01 09:44:08 +0300, Oğuz via austin-group-l at The Open Group:
> On Fri, Sep 1, 2023 at 7:41 AM Phi Debian  wrote:
> > My vote is for posix_printf %B mapping to libc_printf %b
> 
> In the shell we already have bc for base conversion. Does POSIX really
> have to support C2x %b in the first place?

Yes, though note:

- that implies forking a process and loading an external
  executable and its libraries
- bc is not always available. It's not installed by default on
  Debian for instance.
- for bases over 16, it uses some unusual representation that
  can't be used anywhere.

A summary of some options for some common POSIX-like shells at
https://unix.stackexchange.com/questions/191205/bash-base-conversion-from-decimal-to-hex/191209#191209

-- 
Stephane



Re: RFC: changing printf(1) behavior on %b

2023-08-31 Thread Stephane Chazelas
2023-09-01 07:13:36 +0100, Stephane Chazelas via austin-group-l at The Open 
Group:
> 2023-08-31 10:35:59 -0500, Eric Blake via austin-group-l at The Open Group:
> > In today's Austin Group call, we discussed the fact that printf(1) has
> > mandated behavior for %b (escape sequence processing similar to XSI
> > echo) that will eventually conflict with C2x's desire to introduce %b
> > to printf(3) (to produce 0b000... binary literals).
> [...]
> 
> Is C2x's %b already set in stone?
> 
> ksh93's printf (and I'd  expect ast's standalone printf) has
> %[,[,]d to output a number in an
> arbitrary base which IMO seems like a better approach than
> introducing a new specifier for every base.
[...]

For completeness, several shells also support expanding integers
in arbitrary bases.

Like ksh's

typeset -i2 binary=123

already there in ksh85, possibly earlier, also available in
pdksh and derivatives and zsh.

Originally with the base number not specified the output base
was derived from the first assignment like typeset -i var;
var='2#111' would get you a $var that expands in binary. Looks
like that was discontinued in ksh93, but it's still there in
mksh or zsh.

And there's also:

$ echo $(( [#2] 16 )) $(( [##2] 16 ))
2#1 1

In zsh (note that you don't get 0b1 upon $(( [#2] 16 ))
after set -o cbases).

If bash added:

printf -v var %..2 16

à la ksh93, that would bridge that gap.

How to output/expand numbers in bases other thn 8, 10, 16 is a
recurring question for bash, with people generally surprised
that it can *input* numbers in any base, but not *output* in any
base.

See
https://unix.stackexchange.com/questions/415077/how-to-add-two-hexadecimal-numbers-in-a-bash-script/415107#415107
https://unix.stackexchange.com/questions/616215/bash-arithmetic-outputs-result-in-decimal
https://unix.stackexchange.com/questions/749988/arbitrary-base-conversion-from-base-10-using-only-builtins-in-bash
to list only a few.

-- 
Stephane



Re: RFC: changing printf(1) behavior on %b

2023-08-31 Thread Stephane Chazelas
2023-08-31 10:35:59 -0500, Eric Blake via austin-group-l at The Open Group:
> In today's Austin Group call, we discussed the fact that printf(1) has
> mandated behavior for %b (escape sequence processing similar to XSI
> echo) that will eventually conflict with C2x's desire to introduce %b
> to printf(3) (to produce 0b000... binary literals).
[...]

Is C2x's %b already set in stone?

ksh93's printf (and I'd  expect ast's standalone printf) has
%[,[,]d to output a number in an
arbitrary base which IMO seems like a better approach than
introducing a new specifier for every base.

$ printf '%..2d\n' 63
11
$ printf '0b%.8.2d\n' 63
0b0011
$ printf '%#.8.2d\n' 63
2#0011

The one thing it can't do though is left-space-padding of 0b.

printf %b is used in countless scripts especially the more
correct/portable ones that use it to work around the portability
fiasco that is echo's escape sequence expansion. I can't imagine
it going away. Hard to imagine the C folks overlooked it, I'd
expect printf %b to be known by any shell scripter.

-- 
Stephane



Re: [PATCH] confusing/obsolete handling of test -t operator (and doc warnings against using -o/-a)

2023-07-08 Thread Stephane Chazelas
2023-07-07 15:52:28 -0400, Chet Ramey:
[...]
> Historical versions of test made the argument to -t optional here. I can
> continue to support that in default mode for backwards compatibility, but
> it will be an error in posix mode.
[...]

I think you may have overlooked the bottom part of my email
(possibly because it was hidden by your MUA as it included
quoted text) that included comments on the code and a patch.

bash hasn't supported [ -t ] as an alias for [ -t 1 ] since 2.02
and possibly earlier AFAICT since it started supporting the
POSIX rules where [ any-non-empty-single-argument ] returns
true, and having [ -t ] to check whether stdout is a terminal is
not allowed.

The problem here is that some code to support that haven't been
removed at the time the POSIX rules were implemented. The patch
I suggested just removes that code.

ksh93 does support [ -t ] when the -t is literal:

$ ksh93 -c '[ -t ]' > /dev/null || echo stdout is not a terminal
stdout is not a terminal
$ ksh93 -c '[ "-t" ]' > /dev/null || echo stdout is not a terminal
stdout is not a terminal
$ var=-t ksh93 -c '[ "$var" ]' > /dev/null && echo '$var is non-empty'
$var is non-empty

But there's no point going there since that breaks POSIX compliance
for no good reason as [ -t ] as an alias for [ -t 1 ] hasn't
been supported for decades so scripts that were doing [ -t ]
would have long been fixed to [ -t 1 ].

[...]
> > I also noticed that the fact that -a/-o were deprecated (by POSIX at
> > least) and made for unreliable test expressions was not noted in the
> > manual. So I suggest the patch below:
> 
> I added some language about this, noting that POSIX has deprecated them
> and recommending scripts not use them. Thanks for the suggestion.
[...]

Note that "(" and ")" are also obsoleted by POSIX and as a
result any usage of test with 5 or more arguments (hence why I
flagged them as (DEPRECATED) in the doc patch I was suggesting.

-- 
Stephane



[PATCH] confusing/obsolete handling of test -t operator (and doc warnings against using -o/-a)

2023-07-05 Thread Stephane Chazelas
Hello,

test -t X

Always returns false and doesn't report an error about that
invalid number (beside the point here, but in ksh/zsh, that X is treated
as an arithmetic expression and evaluates to 0 if $X is not set).

While:

test -t X -a Y

returns a "too many arguments" error.

test X -a -t -a Y

returns false (without error and regardless of whether any fd is a tty)
while

test X -a Y -a -t

returns true

While for other unary operators that gives:

$ bash -c 'test X -a -x -a Y'
bash: line 1: test: too many arguments

No big deal as in all those the behaviour is unspecfied by
POSIX (non-numeric argument to -t, or more than 4 arguments, -a
deprecated).

It seems to be explained by what looks like a remnant from the
time where [ -t ] was short for [ -t 1 ] in the code of
unary_operator() in test.c

>  /* the only tricky case is `-t', which may or may not take an argument. */

Not anymore.

>  if (op[1] == 't')
>{
>  advance (0);
>  if (pos < argc)
>   {
> if (legal_number (argv[pos], &r))
>   {
> advance (0);
> return (unary_test (op, argv[pos - 1], 0));
>   }
> else
>   return (FALSE);

Maybe the intention was to do a isatty(1) here instead of always
returning false, but that and the fact that advance() is not called only
confuses things.

>   }
>  else
>   return (unary_test (op, "1", 0));

That part is never reached AFAICT as unary_operator() is never
called with pos == argc.

I beleive that whole code can go as -t is now always a unary
operator, and it would be more useful to report an error when
the operand is not a number.

I also noticed that the fact that -a/-o were deprecated (by POSIX at
least) and made for unreliable test expressions was not noted in the
manual. So I suggest the patch below:

diff --git a/doc/bashref.texi b/doc/bashref.texi
index 85e729d5..00fbab69 100644
--- a/doc/bashref.texi
+++ b/doc/bashref.texi
@@ -4215,14 +4215,14 @@ Operator precedence is used when there are five or more 
arguments.
 @item ! @var{expr}
 True if @var{expr} is false.
 
-@item ( @var{expr} )
+@item ( @var{expr} ) (DEPRECATED)
 Returns the value of @var{expr}.
 This may be used to override the normal precedence of operators.
 
-@item @var{expr1} -a @var{expr2}
+@item @var{expr1} -a @var{expr2} (DEPRECATED)
 True if both @var{expr1} and @var{expr2} are true.
 
-@item @var{expr1} -o @var{expr2}
+@item @var{expr1} -o @var{expr2} (DEPRECATED)
 True if either @var{expr1} or @var{expr2} is true.
 @end table
 
@@ -4283,11 +4283,26 @@ Otherwise, the expression is parsed and evaluated 
according to
 precedence using the rules listed above.
 @end enumerate
 
-@item 5 or more arguments
+@item 5 or more arguments (DEPRECATED)
 The expression is parsed and evaluated according to precedence
 using the rules listed above.
 @end table
 
+In the 4 or 5 arguments case, the use of @samp{(}, @samp{)}, binary
+@samp{-a}, binary @samp{-o} make for unreliable test expressions. For
+instance @code{test "$x" -a ! "$y"}  becomes a test for whether a
+@samp{!} file exists if @code{$x} is @samp{(} and @code{$y} is
+@samp{)} and @code{[ -f "$file" -a ! -L "$file" ]} fails with a
+syntax error for a file called @samp{==}. Which explains why those
+are deprecated as they have been in the POSIX specification of the
+@code{test} utility since 2008.
+
+Each invocation of @code{[} / @code{test} should perform a single test
+and several invocations may be chained with the @code{&&} or @code{||}
+shell operators to achieve the same result as the @code{-a} and
+@code{-o} operators reliably as in @code{test "$x" && test ! "$y"} or
+@code{[ -f "$file" ] && [ ! -L "$file" ]} in the examples above.
+
 When used with @code{test} or @samp{[}, the @samp{<} and @samp{>}
 operators sort lexicographically using ASCII ordering.
 
diff --git a/test.c b/test.c
index 2b12197a..e16337a5 100644
--- a/test.c
+++ b/test.c
@@ -476,24 +476,6 @@ unary_operator (void)
   if (test_unop (op) == 0)
 return (FALSE);
 
-  /* the only tricky case is `-t', which may or may not take an argument. */
-  if (op[1] == 't')
-{
-  advance (0);
-  if (pos < argc)
-   {
- if (legal_number (argv[pos], &r))
-   {
- advance (0);
- return (unary_test (op, argv[pos - 1], 0));
-   }
- else
-   return (FALSE);
-   }
-  else
-   return (unary_test (op, "1", 0));
-}
-
   /* All of the unary operators take an argument, so we first call
  unary_advance (), which checks to make sure that there is an
  argument, and then advances pos right past it.  This means that
@@ -603,7 +585,7 @@ unary_test (char *op, char *arg, int flags)
 
 case 't':  /* File fd is a terminal? */
   if (legal_number (arg, &r) == 0)
-   return (FALSE);
+   integer_expected_error (arg);
   return ((r == (int)r) && isatty ((int)r));
 
 case 'n':  /* True if arg has some length. */





Re: syntax error while parsing a case command within `$(...)'

2021-02-14 Thread Stephane Chazelas
2021-02-14 18:02:52 +0700, Robert Elz:
[...]
>   | I guess you are using Bash for so many years,
> 
> Yes, since Paul Fox created and maintained it (version 1).   It allowed
> me to escape from csh.

ITYM Brian Fox. Maybe the confusion comes from zsh's Paul
Falstad.

-- 
Stephane



Re: export loses error

2021-02-14 Thread Stephane Chazelas
2021-02-09 10:23:51 -0500, Chet Ramey:
[...]
> It's the assignment statement that's the oddball here; it's the only place
> where the exit status from a command substitution has any effect. This is a
> POSIX (maybe ksh) invention to provide assignment statements with a useful
> exit status.
[...]

It was already like that in the Bourne shell, the shell that
introduced command substitution in the late 70s.

Here on a PDP11 emulator running Unix V7:

PDP-11 simulator V3.8-1
Disabling XQ
@boot
New Boot, known devices are hp ht rk rl rp tm vt
: rl(0,0)rl2unix
mem = 177856
# false
# echo $?
1
# a=`false`
# echo $?
1

-- 
Stephane



Re: man bash-builtins

2021-02-14 Thread Stephane Chazelas
2021-02-12 19:33:33 -0700, ron:
> In the Synopsis section, the builtin `caller` is not included. Several
> keywords are listed as builtins: [, if, until and while.

You're probably refering to the bash-builtins.1 man page shipped
with Debian. If you look at the bottom, you'll see that man page
is from bash 2.05a.

If you look at the CHANGES file in bash-2.05b released in 2002,
you'll see:

> l.  Removed the reserved words from the `bash-builtins' manual
> page.

The doc/builtins.1 in current versions of bash doesn't have the
problem, though it's missing the "readarray" builtin (IMO, the
better name for the "mapfile" builtin) and lists "bash" itself
as a builtin.

"[" itself *is* a builtin, (an alias for "test") not a keyword.

You may want to raise the issue with Debian instead.

You'll notice that doc/builtins.1 is very short, just a header
and then .SO's the relevant section of the bash.1 man page.

That file could easily be automatically generated based on the
output of the "enable" builtin for instance.

Personally, to learn about a bash builtin, I just run

   info bash builtin

-- 
Stephane



[long] [x-y] bash range wildcard behaviour with non-ASCII data is very random especially with globasciiranges on

2021-02-07 Thread Stephane Chazelas
Hello,

I was wondering why in my en_GB.UTF-8 locale, [0-9] matched
"only" on 1044 characters in bash 5.1 while in bash 4.4 it used
to match on 1050 different ones.

It turns out it's because since 5.0, the globasciiranges option
is enabled by default. Then I tried to understand what that
option was actually doing, but the more I tested, the least
sense it made and the whole thing seems to be quite buggy to me.

The manual says:

DOC> 'globasciiranges'
DOC>  If set, range expressions used in pattern matching bracket
DOC>  expressions (*note Pattern Matching::) behave as if in the
DOC>  traditional C locale when performing comparisons.  That is,
DOC>  the current locale's collating sequence is not taken into
DOC>  account, so 'b' will not collate between 'A' and 'B', and
DOC>  upper-case and lower-case ASCII characters will collate
DOC>  together.

In the C locale, POSIX defines the collation order as being the
same as the order of characters in the ASCII character set even
if the C locale's charmap is not ASCII like on EBCDIC systems
(and all other characters if any in the C locale's charmap
(which have to be single-bytes) have to sort after ^? the last
ASCII character). On all systems I've ever used, in the C
locale, the charset was ASCII, and the collation order was based
on the byte value of the encoding (strcoll() is equivalent to
strcmp()), even on characters that are undefined (the ones with
encoding 0x80 to 0xff).

Yet, the DOC above doesn't reflect what happens in bash in
multibyte locales (the norm these days), as bash still appears
to (sometimes at least) decode sequences of bytes into the
corresponding character in the user's locale, not in the C
locale and use the locale's collation order.

I should point out that I've since read:
https://lists.gnu.org/archive/html/bug-bash/2018-08/msg00027.html
https://lists.gnu.org/archive/html/bug-bash/2019-03/msg00145.html
(and https://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html)
but those barely scratched the surface.

I had a look at the code to try and understand what was going
on, and here are my findings below:

What I found was that the behaviour of [x-y] ranges in wildcards
depended on:

- the setting of the globasciiranges option
- whether the locale uses a single-byte-charset or not
- in locales with multibyte characters
  - whether the pattern and subject contain sequences of bytes
that don't form valid characters
  - whether the pattern and subject contain only single byte
characters
  - whether the wide char value of the characters are in the
0..255 range or not.

(I've not looked at the effect of nocasematch/nocaseglob).

== locales with single-byte charset

Let's take those out of the way as they are the simplest:

In single-byte per character locales (which were common until
the late 90s), [x-y] matches on characters whose byte encoding
is (numerically) between that of x and that of y when
the globasciiranges is on as it is by default since 5.0 (whether
the characters are in the ASCII set or not or whether the
locale's charset is a superset of ASCII or not; in other words,
it has little to do with ASCII).

When globasciiranges is off, [x-y] matches on characters c that
collate between x and y, but with an additional check: if c
collates the same as x but has a byte value that is less than
that of x or collates the same as y but has a byte value greater
than that of y, then it won't be included (a good thing IMO).

== multi-byte charset locales

=== invalid text

First, independently of ranges, bash pattern matching operates on
two different modes whether the input and pattern are valid text
in the locale or not.

If the pattern or subject contains sequences of bytes that don't
form valid characters in the locale, then the pattern matching
works at byte level 

For instance, in a UTF-8 locale.

[[ $string = [é-é] ]]

matches on strings that start with é and are followed by exactly
4 characters as long as $string contains valid UTF-8 text, but
if not, the test becomes: [[ $string = [\xc3\xa9-\xc3-\xa9] ]]
where [\xc3\xa9-\xc3-\xa9] matches on byte 0xc3 or bytes 0xa9
to 0xc3 or byte 0xa9 (so bytes 0xa9 to 0xc3) and ? matches a
single byte (including each byte of each valid multibyte
character). For instance:

$ string=$'áé\x80' bash -c '[[ $string = [é-é] ]]' && echo yes
yes

as that's the 0xc3 0xa1 0xc3 0xa9 0x80 byte sequence.

[[ é = *$'\xa9' ]] and [[ á = [é-é]*$'\xa1' ]] both match, this
time because the *pattern* is not valid UTF-8.

Or in 

$ LANG=zh_HK.big5hkscs luit
$ bash --norc
bash-5.1$ [[ '*' = [β*] ]] && echo yes
yes
bash-5.1$ [[ $'αwhatever\xff]' = [β*] ]] && echo yes
yes

A pattern meant to match any of two characters also matches
strings of any length as it become a completely different
pattern once applied to a string that contains sequence of bytes
not forming valid characters in the locale.

In that same locale:

bash-5.1$ pat='α\*'
bash-5.1$ [[ 'α*' =

Re: RMS looking for assistance

2020-05-31 Thread Stephane Chazelas
2020-05-27 10:19:35 -0400, Chet Ramey:
> Richard Stallman is looking for a shell programmer to assist with
> modifying a relatively complex script he uses to process messages and
> access a specific URL. The script is in perl, but he is worried about
> security issues processing URLs containing dangerous characters.
> 
> If you are interested in assisting, please contact a...@gnu.org.
[...]

ITYM r...@gnu.org.

Alfred M. Szmidt  is the maintainer of the GNU
networking utilities according to
https://www.gnu.org/people/people.html

-- 
Stephane



Re: test -v for array does not work as documented

2020-02-19 Thread Stephane Chazelas
2020-02-19 17:18:14 +0100, Ulrich Mueller:
[...]
> So, is there any syntax that allows to test if a value has been assigned
> to the array variable? Especially, to distinguish VARNAME=() (empty
> array) from VARNAME being unset?
[...]

You could do:

if typeset -p var 2> /dev/null | grep -q '='; then
  echo var has a value
fi

For namerefs, that's whether they are referencing a variable,
that variable could still be undeclared (let alone set).

-- 
Stephane



Re: Unicode range and enumeration support.

2019-12-25 Thread Stephane Chazelas
2019-12-24 12:16:41 -0500, Eli Schwartz:
[...]
> > Also note that sort -u and sort | uniq are not quite the same, the -u
> > option only considers the key fields when deciding which records (lines)
> > are unique (of course, with no key options, the whole line is the key,
> > in which case they are more or less the same).
> 
> Hmm, is that "more or less" the same, or actually the same? Seems like
> it would be actually the same... in which case I'd rephrase it to say
> "sort -u can do things that uniq can't, because it takes an optional key
> modifier". (uniq does have -s / -f which is sort of kind of partially
> approaching the same thing, but eh, doesn't really count.)
[...]

It depends on the implementation.

sort is meant to compare strings with strcoll(), that is as per
the locale's collation rules. If two strings collate the same,
some implementations will resort to a strcmp()-type comparison,
some won't.

So sort -u will report the first (or possibly last) of any
sequence of lines that sort the same, and whether that first one
is the one with the lowest byte values or not will depend on the
implementation.

uniq itself is meant to do byte-to-byte comparison instead of
strcoll(), so sort | uniq on an input that contains different
strings that collate the same could very well give random
results. sort | uniq with a POSIX uniq would only work correctly
with sort implementations that resort to a byte to byte
comparison for lines that collate the same.

GNU uniq does use strcoll() instead of strcmp(). As such, it's
not POSIX compliant, but that means that sort -u works the same
as sort | uniq there.

As an example. Here on a GNU system (glibc 2.30, coreutils 8.30)
and in the en_GB.UTF-8 locale:

$ perl -C -le 'no warnings; print chr$_ for 0..0xd7ff, 0xe000..0x10' | wc -l
1112065
$ perl -C -le 'no warnings; print chr$_ for 0..0xd7ff, 0xe000..0x10' | sort 
-u | wc -l
50714

That is, out of the 1M+ characters in Unicode, GNU sort only
considers 50k distinct ones. Note that it used to be a lot worse
than that.

$ cat c   
🧝
🧜
🧙
🧛
🧝
🧚
$ u < c
U+1F9DD ELF
U+1F9DC MERPERSON
U+1F9D9 MAGE
U+1F9DB VAMPIRE
U+1F9DD ELF
U+1F9DA FAIRY
$ sort -u c
🧝
$ sort c | uniq
🧝


Those characters have not been assigned any sort order, and end
up sorting the same. 

The GNU sort algorithm is "stable" in that it keeps the original
order for lines that have equal sort keys. So here, we get the
merperson because it happens to be the first in the input.

You can see GNU uniq is not POSIX as there are 5 different lines
in the input but it returns only one. Even if it was POSIX, it
would fail to remove the duplicate Elf as they are not adjacent.

Now, let's look at the heirloom toolchest tools 

$ sort c
🧝
🧚
🧛
🧝
🧙
🧜
$ sort -u c
🧜
$ sort c | uniq
🧝
🧚
🧛
🧝
🧙
🧜


That sort is not stable, so we get some random order on those
lines with identical sort order. It's uniq which here is POSIX
compliant, failed to remove the duplicate Elf as the Elves were
not adjacent.

Since the 2018 edition of the standard, it's recommended that
locales that don't have a @ in their name should have a total
ordering of all characters and that sort/ls/globs (globs being
the only thing on topic here)... should do a last-resort
strcmp()-like comparison for lines that collate the same.

The next major release will make it a requirement.

See:

http://austingroupbugs.net/view.php?id=938
http://austingroupbugs.net/view.php?id=963
http://austingroupbugs.net/view.php?id=1070

For now, use:

- sort -u
  to get one of each set of lines that sort the same (which one
  it is undefined)
- LC_ALL=C sort -u
  or
  LC_ALL=C sort | LC_ALL=C uniq
  to get unique lines (sorted by byte value)
- LC_ALL=C sort -u | sort
  to get unique lines sorted as per the collation's sort order
  (note that the order may not be deterministic for lines that
  collate equally)

sort | uniq itself can't be used reliably outside the C locale.

For the record, that "u" was:

u() {
  perl -Mcharnames=full -Mopen=locale -lne '
printf "U+%04X %s\n", ord($_), charnames::viacode(ord($_)) for /./g' "$@"
}

-- 
Stephane




Re: Locale not Obeyed by Parameter Expansion with Pattern Substitution

2019-11-18 Thread Stephane Chazelas
2019-11-18 20:46:26 +, Stephane Chazelas:
[...]
> > printf -v B '\u204B'
> > set -- ${B//?()/ }
> > echo "${@@Q}"   #-> $'\342' $'\201' $'\213'
[...]
> It seems to me that zsh's approach is best:
> 
> $ A=$'\u2048\201\u2048' zsh  -c "printf '%q\n' \"\${A//$'\201'/:}\""
> ⁈:⁈
> 
> That is replace that \201 byte, except when it's part of a
> properly encoded character.
[...]

Actually, zsh would also break a character if the byte to be
replaced is the first of the character:

$ A=$'\u2048\342\u2048' zsh -c "printf '%q\n' \"\${A//$'\342'/:}\""
:$'\201'$'\210'::$'\201'$'\210'

Note that in charsets like BIG5/GB18030... which have characters
whose encoding contains the encoding of other characters, bash
seems to behave better than in UTF-8.

For instance the encoding of é in BIG5-HKSCS is 0x88 0x6d where
0x6d is also the encoding of "m" like in ASCII.

$ printf é | iconv -t big5-hkscs | od -tc -tx1
000 210   m
 88  6d
002
$ LC_ALL=zh_HK.big5hkscs luit
$ U=Stéphane bash -c 'printf "%s\n" "${U//m}"'
Stéphane
$ U=Stéphane ksh93 -c 'printf "%s\n" "${U//m}"'
Stéphane
$ U=Stéphane zsh -c 'printf "%s\n" "${U//m}"'
Stéphane

All 3 shells OK, but:

$ U=Stéphane bash -c 'printf "%s\n" "${U//$'\''\210'\''}"'
Stmphane
$ U=Stéphane ksh  -c 'printf "%s\n" "${U//$'\''\210'\''}"'
Stmphane
$ U=Stéphane zsh  -c 'printf "%s\n" "${U//$'\''\210'\''}"'
Stmphane

All 3 shells "break" that é character there.

-- 
Stephane



Re: Locale not Obeyed by Parameter Expansion with Pattern Substitution

2019-11-18 Thread Stephane Chazelas
2019-11-17 01:25:31 -0800, Chris Carlen:
[...]
> # write 'REVERSE PILCROW SIGN' to B, then repeat as above:
> printf -v B '\u204B'
> set -- ${B//?()/ }
> echo "${@@Q}"   #-> $'\342' $'\201' $'\213'
> 
> # NOTE: Since there is only one character (under the UTF-8 locale),
> # this should have set only the first positional parameter with the
> # character REVERSE PILCROW SIGN, not split it into bytes (AFAIK).
[...]

Yes, the question is where to resume searching after a match of
an empty string in ${var//pattern/replacement}.

Note that it's even worse in ksh93 where bash copied that syntax
from:

$ A=$'\u2048\u2048' ksh93 -c 'printf "%q\n" "${A//?()/:}"'
$':\u[2048]:\x81:\x88:\u[2048]:\x81:\x88:'

(here with ksh93u+)

Then there's the question of what

${B/$'\201'/}

should do. Should that $'\201' match the byte component of the encoding of
U+204B?

It seems to me that zsh's approach is best:

$ A=$'\u2048\201\u2048' zsh  -c "printf '%q\n' \"\${A//$'\201'/:}\""
⁈:⁈

That is replace that \201 byte, except when it's part of a
properly encoded character.

Compare with:

$ A=$'\u2048\201\u2048' bash  -c "printf '%q\n' \"\${A//$'\201'/:}\""
$'\342:\210:\342:\210'

$ A=$'\u2048\201\u2048' ksh93  -c "printf '%q\n' \"\${A//$'\201'/:}\""
$'\u[2048]:\x88:\u[2048]:\x88'

(or yash which can't deal with that \201 byte at all as it can't
form a valid character).

-- 
Stephane



Re: Parameter expansion resulting empty treated as if it's not empty

2019-10-30 Thread Stephane Chazelas
2019-10-30 14:12:41 +0300, Oğuz:
[...]
> I was expecting
> 
> bash -c '${1##*"${1##*}"}' _ foo
> 
> to print an empty line too, but instead it prints foo.
[...]
> Is this a bug?

Yes,

In gdb, we see the ${1##*} expands to \177 (CTLNUL) as a result
of quote_string(). And that's used as is in the outer pattern.

It looks like an "unquoting" may be missing in that case.

See also:

$ bash -c 'printf %s "${2%%"${1##*}"*}"' bash foo $'x\177foo' | hd
  78|x|
0001

It seems it's a regression, introduced in 4.0.

-- 
Stephane




Re: Unexpected sourcing of ~/.bashrc under ssh

2019-10-25 Thread Stephane Chazelas
Seems to be down to:

bash-5.0$ printenv SHLVL
1
bash-5.0$ printenv SHLVL | cat
0

Possibly a consequence of the fix for
https://lists.gnu.org/archive/html/bug-bash/2016-09/msg0.html

-- 
Stephane




Re: bash sets O_NONBLOCK on pts

2019-10-03 Thread Stephane Chazelas
2019-10-03 13:58:40 -0400, Chet Ramey:
> On 10/2/19 11:38 AM, Stephane Chazelas wrote:
> 
> > BTW, what's the point of the check_dev_tty() function? It seems
> > it just attempts to open the tty (the controlling one or the one
> > open on stdin), closes it, but doesn't return anything about the
> > success of failure in doing so.
> 
> It's to make up for an old bug that's probably gone everywhere. Back
> in the day, there were systems and services that would start interactive
> shells without a controlling terminal. Opening and closing /dev/tty
> forced the controlling terminal to be allocated.
[...]

Thanks.

I suspected it may be something like that. Though I thought,
without actually double-checking it, that the rest of the code
would end up opening the terminal device (without O_NOCTTY) anyway.

It may be worth adding a comment to document that purpose of the
function, and maybe rename it to something like
control_tty_if_not_already_controlled or something that more
accurately describes its intent.

-- 
Stephane




Re: bash sets O_NONBLOCK on pts

2019-10-02 Thread Stephane Chazelas
2019-10-03 02:49:36 +0900, Andrew Church:
> >Well, it's not so uncommon, I had it a few times. Reading on internet
> >it seems that other users have it but don't notice it.
> 
> The fault could be in some other program accessing the terminal.  Bash
> does not clear O_NONBLOCK on displaying a prompt, so if a previously
> executed program sets O_NONBLOCK on stdin and then exits, that state
> will remain until some other program unsets it.  For example:
> 
> $ cat >foo.c
> #include 
> int main(void) {fcntl(0, F_SETFL, O_NONBLOCK); return 0;}
> ^D
> $ cc foo.c
> $ ./a.out
> $ cat
> cat: -: Resource temporarily unavailable
[...]

Good point.

I see a difference between versions of bash there:

With GNU dd:

~$ bash5 --norc
bash5-5.0$ dd iflag=nonblock
dd: error reading 'standard input': Resource temporarily unavailable
0+0 records in
0+0 records out
0 bytes copied, 0.000150515 s, 0.0 kB/s
bash5-5.0$ cat
^C
bash5-5.0$ exit

~$ bash --norc
bash-4.4$ dd iflag=nonblock
dd: error reading 'standard input': Resource temporarily unavailable
0+0 records in
0+0 records out
0 bytes copied, 0.000126312 s, 0.0 kB/s
bash-4.4$ cat
cat: -: Resource temporarily unavailable

In bash5, with strace, we see:

fcntl(0, F_GETFL)   = 0x8802 (flags 
O_RDWR|O_NONBLOCK|O_LARGEFILE)
fcntl(0, F_SETFL, O_RDWR|O_LARGEFILE)   = 0

That seems to be done by sh_unset_nodelay_mode()

Which points to this change:

commit bc371472444f900d44050414e3472f7349a7aec7
Author: Chet Ramey 
Date:   Mon Jan 30 15:50:08 2017 -0500

commit bash-20170127 snapshot

diff --git a/CWRU/CWRU.chlog b/CWRU/CWRU.chlog
index b8436d64..74a0463e 100644
--- a/CWRU/CWRU.chlog
+++ b/CWRU/CWRU.chlog
@@ -13027,3 +13027,21 @@ subst.c
  after reading a double-quoted string, make sure the W_NOCOMSUB and
  W_NOPROCSUB flags make it to the recursive invocation.  Fixes bug
  reported by Jens Heyens 
+
+  1/23
+  
+lib/readline/signals.c
+   - _rl_orig_sigset: original signal mask, set and restored by
+ rl_set_signals (rl_clear_signals doesn't block signals).  If we
+ are not installing signal handlers, just save signal mask each
+ time rl_set_signals is called
+
+lib/readline/input.c
+   - rl_getc: use _rl_orig_sigmask in the call to pselect(), so we block
+ the set of signals originally blocked by the calling application.
+ Fixes bug reported by Frédéric Brière 
+
+parse.y
+   - yy_readline_get: try to unset NONBLOCK mode on readline's input
+ file descriptor before calling readline(). Inspired by report from
+ Siteshwar Vashisht 



Given that the OP is running 5.0.7, they should have that change already.

Maybe the fcntl(O_NONBLOCK) is done by a command run by a completion
widget.

-- 
Stephane




Re: bash sets O_NONBLOCK on pts

2019-10-02 Thread Stephane Chazelas
2019-10-02 14:27:48 +0200, Matteo Croce:
[...]
> Sometimes bash leaves the pts with O_NONBLOCK set, and all programs
> reading from stdin will get an EAGAIN:
[...]

Can you reproduce it with

   bash --norc

Or with:

   INPUTRC=/dev/null bash --norc

?

If you could reproduce it with:

   strace -o strace.log bash --norc

that would allow us to see where a O_NONBLOCK flag is set and
not reset.

BTW, what's the point of the check_dev_tty() function? It seems
it just attempts to open the tty (the controlling one or the one
open on stdin), closes it, but doesn't return anything about the
success of failure in doing so.

On my system (Debian amd64, 5.0.3(1)-release, bash started from
a regular terminal emulator), it's the only place where I see
O_NONBLOCK being used (and that's on a new fd that is closed
straight after, so it could not have any bearing on the OP's
issue).

-- 
Stephane




Re: Wildcard expansion can fail with nonprinting characters

2019-09-30 Thread Stephane Chazelas
2019-09-30 15:35:21 -0400, Chet Ramey:
[...]
> The $'\361' is a unicode combining
> character, which ends up making the entire sequence of characters an
> invalid wide character string in a bunch of different locales.
[...]

No, $'\u0361', the unicode character 0x361 (hex) is "COMBINING
DOUBLE INVERTED BREVE" (encoded as \315\241 in UTF-8)

But $'\361' is byte value 0361 (octal). In UTF-8, on its own
it's an invalid byte sequence. That's 2#0001, which would be
the first byte of a 4 byte-long character (of characters U+4
to U+7). In latin1, that's ñ (LATIN SMALL LETTER N WITH
TILDE).

So $'foo\361bar' is not text in UTF-8, but that's an encoding
issue, not a problem with combining characters.

$ locale charmap
UTF-8
$ printf '\u361' | od -An -to1
 315 241
$ printf '\U4' | od -An -vto1
 361 200 200 200
$ printf 'foo\361bar' | iconv -f utf8
fooiconv: illegal input sequence at position 3

-- 
Stephane




Re: Wildcard expansion can fail with nonprinting characters

2019-09-27 Thread Stephane Chazelas
2019-09-27 16:52:50 -0700, Geoff Kuenning:
[...]
> $ mkdir /tmp/test$'\361'dir
> $ touch /tmp/test�dir/foo
> $ ls /tmp/test�dir/f*
> /tmp/test?dir/foo
> $ x=/tmp/test�dir
> $ echo "$x" | cat -v
> /tmp/testM-qdir
> $ ls "$x"/f*
> ls: cannot access '/tmp/test'$'\361''dir/f*': No such file or directory
[...]

I can reproduce on Ubuntu 18.04  with bash 4.4 in UTF-8 locales
and some other but not all multi-byte character locales
(ja_JP.eucjp, but not zh_CN.gb18030 for instance).

I can't reproduce in 5.0.10. Presumably, the issue has been
fixed already.

-- 
Stephane



Re: [PATCH] docs: More hints on #! parsing

2019-09-25 Thread Stephane Chazelas
2019-09-25 15:33:24 -0500, Eric Blake:
[...]
>  Bash scripts often begin with @code{#! /bin/bash} (assuming that
>  Bash has been installed in @file{/bin}), since this ensures that
>  Bash will be used to interpret the script, even if it is executed
> -under another shell.
> +under another shell.  Another common practice is the use of
> +@code{#!/bin/env bash} to find the first instance of bash on @env{PATH}.
[...]

env is more commonly found in /usr/bin. There's no env in /bin
on Ubuntu or FreeBSD for instance.

Using "#! /bin/bash -" is also good practice (so your script
still works when its path starts with - or +, see also
https://unix.stackexchange.com/questions/351729/why-the-in-the-bin-sh-shebang
for more historical context).

-- 
Stephane



[repost] "precision" of $SECONDS

2019-09-16 Thread Stephane Chazelas
That's a a re-post of a bug report I raised a few years ago and
comes back now and then in various Q&A sites I participate in.

The discussions kind of trailed off last time.
https://www.mail-archive.com/bug-bash@gnu.org/msg17783.html

2016-02-24 15:16:41 +, Stephane Chazelas:
> $ time bash -c 'while ((SECONDS < 1)); do :; done'
> bash -c 'while ((SECONDS < 1)); do :; done'  0.39s user 0.00s system 99% cpu 
> 0.387 total
> 
> That can take in between 0 and 1 seconds. Or in other words,
> $SECONDS becomes 1 in between 0 and 1 second after the shell was
> started.
> 
> The reason seems to be because the shell records the value
> returned by time() upon start-up and $SECONDS expands to
> time()-that_saved_time. So, if bash is started at 10:00:00.999,
> then $SECONDS will become 1 only a milisecond after startup
> while if it's started at 10:00:01.000, $SECONDS will become 1 a
> full second later.
> 
> IMO, it would be better if gettimeofday() or equivalent was used
> instead of time() so that $SECONDS be incremented exactly one
> second after start-up like ksh93 does.
> 
> mksh and zsh behave like bash (I'll raise the issue there as
> well).
> 
> With zsh (like in ksh93), one can do "typeset -F SECONDS" to
> make $SECONDS floating point, which can be used as a work around
> of the "issue".
[...]

Note that since then the corresponding bugs in mksh and zsh have
been fixed.

-- 
Stephane



Re: Pathname expansion vs. filename expansion

2019-08-20 Thread Stephane Chazelas
2019-08-20 16:30:21 +0100, Stephane Chazelas:
[...]
> See also "filename generation" or "globbing" which avoid the
> potential confusion with ~user and <(...) which also are
> "pathname expansion" operators.
[...]

FWIW, in zsh's manual, "filename expansion" refers to ~user, ~1,
~-1, ~named-directory, =cmd while "filename generation" refers
to globbing.

In dash: Pathname Expansion (File Name Generation)

In mksh: file name generation

In ksh93: File Name Generation

In yash: Pathname expansion (globbing)

In tcsh: Filename substitution

In fish: globbing

-- 
Stephane



Re: Pathname expansion vs. filename expansion

2019-08-20 Thread Stephane Chazelas
2019-08-20 16:15:42 +0100, Stephane Chazelas:
[...]
> https://www.gnu.org/prep/standards/html_node/GNU-Manuals.html#GNU-Manuals
> 
> GNU> Please do not use the term “pathname” that is used in Unix
> GNU> documentation; use “file name” (two words) instead. We use the
> GNU> term “path” only for search paths, which are lists of directory
> GNU> names. 
> 
> So I guess that should be "file name expansion"
> 
> That's probably not the right place to argue whether that GNU
> recommendations makes sense, but note that the FTP RFC (1985
> https://www.ietf.org/rfc/rfc959.txt, so predates POSIX if not
> the GNU project) defines pathname as
[...]

Actually, Unix V1 in 1970 already used "pathname" for that, long
before the concept of the environment (let alone $PATH) was
introduced.

I don't know why rms insists on using "file name" here which at
best is ambiguous.

See
https://lists.gnu.org/archive/html/bug-standards/2009-11/msg3.html
and rms response:
https://lists.gnu.org/archive/html/bug-standards/2009-11/msg5.html

See also "filename generation" or "globbing" which avoid the
potential confusion with ~user and <(...) which also are
"pathname expansion" operators.

-- 
Stephane



Re: Pathname expansion vs. filename expansion

2019-08-20 Thread Stephane Chazelas
2019-08-20 10:08:10 -0400, Chet Ramey:
[...]
> However, at some point -- I can't find it now -- the GNU documentation
> standards recommended using "filename" and "filename expansion," reserving
> "pathname" for colon-separated values like $PATH.
[...]

I think you're refering to:

https://www.gnu.org/prep/standards/html_node/GNU-Manuals.html#GNU-Manuals

GNU> Please do not use the term “pathname” that is used in Unix
GNU> documentation; use “file name” (two words) instead. We use the
GNU> term “path” only for search paths, which are lists of directory
GNU> names. 

So I guess that should be "file name expansion"

That's probably not the right place to argue whether that GNU
recommendations makes sense, but note that the FTP RFC (1985
https://www.ietf.org/rfc/rfc959.txt, so predates POSIX if not
the GNU project) defines pathname as

pathname

  Pathname is defined to be the character string which must be
  input to a file system by a user in order to identify a file.
  Pathname normally contains device and/or directory names, and
  file name specification.  FTP does not yet specify a standard
  pathname convention.  Each user must follow the file naming
  conventions of the file systems involved in the transfer.


-- 
Stephane



Re: Filename expansion bug

2019-08-08 Thread Stephane Chazelas
2019-08-08 10:38:48 -0400, Greg Wooledge:
[...]
> > shopt -s failglob
> > command="echo xyz\(\)"
> > $command
> > ```
> > 
> > And it was working fine. But somewhere between bash version 4 and 5 I
> > realized it generates an error:
> > no match: xyz\(\)
[...]
> And my personal response for this variant of the issue: stop putting
> shell commands in string variables and then trying to run them with
> unquoted parameter expansion.  See:
[...]

While I'd agree in this case, note that the austin-group thread
was mentionning a very similar case:
https://www.mail-archive.com/austin-group-l@opengroup.org/msg04213.html

That was in autoconf's configure script (including the one
shipped with bash) that is /broken/ by that change in 5.0:

as_echo='printf %s\n'
$as_echo x

now runs:

printf '%sn' x

if there's a file called %sn in the current directory and

printf '%s\n' x

otherwise

(and would cause an error with failglob and run printf x with
nullglob).

In that case, it's not because the autoconf authors don't know
any better but because at that point in the script where
$ac_echo is defined, they're trying to support ancient shells
that don't have function support.

-- 
Stephane



Re: Setting nullglob causes variables containing backslashes to be expanded to an empty string

2019-08-06 Thread Stephane Chazelas
[re-post via gmane as the usenet interface seems not to work
again. My posts can be seen at
https://groups.google.com/forum/#!topic/gnu.bash.bug/0JgBRq_778o
but were apparently not forwarded to the mailing list]

2019-08-06 16:00:21 -0400, Greg Wooledge:
> On Tue, Aug 06, 2019 at 06:18:27PM +, Mohamed Akram wrote:
> > Bash version: GNU bash, version 5.0.7(1)-release (x86_64-apple-darwin18.5.0)
> >
> > Example:
> >
> > shopt -s nullglob
> > a='\x30'
> > echo $a
> >
> > Expected output:
> >
> > \x30
> >
> > Actual output:
> >
>
> Also happens in bash 5.0 on Debian GNU/Linux.  It does not happen in
> bash 4.4 or earlier (I tried back to 3.2) on the same machine.
[...]

That is being discussed on the austin-group mailing list (and has
been discussed here before as well IIRC).

The idea is that in 5.0, \ became a globbing quoting operator.

So with nullglob, the \x30 expands to x30 when there's a file
called x30 in the current directory and nothing if not.

That is by design and was supported until relatively recently by
some Austin Group people (the guys behind POSIX). That's not
done by any other major shell.

See http://austingroupbugs.net/view.php?id=1234 and the very
long discussions that follow on the mailing list:

See for instance
https://www.mail-archive.com/austin-group-l@opengroup.org/msg04237.html

As seen there, with the current head of the  devel branch, that
behaviour can be disabled by turning off the posixglob option.

$ a='\x30' ./bash +O posixglob -O nullglob -c 'printf "%s\n" $a'
\x30
$ a='\x30' ./bash -O nullglob -c 'printf "%s\n" $a'

$ touch x30
$ a='\x30' ./bash -O nullglob -c 'printf "%s\n" $a'
x30


In any case, yes, do remember to quote your variable expansions
and not use echo for arbitrary data.

-- 
Stephane




Re: x[

2019-07-29 Thread Stephane Chazelas
2019-07-29 17:55:58 +0100, Isabella Bosia:
> haven't really looked into why this happens but x[ seems to trigger some
> funny parser behavior
> 
> x[ newline should not prompt with PS2
> 
> it can't be defined as a normal sh function, but it can be defined with the
> function keyword
[...]

x[ is the start of an array element assignment. newline is valid (just a token
separator like space) inside an arithmetic expression.

$ x[
> 1
> +
> 1
> ]=3
$ typeset -p x
declare -a x=([2]="3")

You'll notice:

$ +[
+[: command not found
$ 'x'[
x[: command not found

-- 
Stephane



Re: T/F var expansion?

2019-07-29 Thread Stephane Chazelas
2019-07-28 21:17:43 -0700, L A Walsh:
> Is there a T/F var expansion that does:
>  
> var=${tst:+$yes}${tst:-$no}
> 
> but with yes/no in 1 expansion?
[...]

You can also do:

no_yes=(no yes)
echo "${no_yes[${var+1}]}"

For the reverse:

echo "${no_yes[!0${var+1}]}"

See also:

map=(unset empty non-empty)
echo "${map[${var+1}+0${var:+1}]}"

-- 
Stephane




Re: Incorrect option processing in builtin printf(1)

2019-07-22 Thread Stephane Chazelas
2019-07-22 14:55:05 -0500, Eric Blake:
[...]
> > Even if POSIX didn't mandate
> > 
> > printf -- -%s x
> > 
> > to output -x, I'd say it would be a bug in the POSIX
> > specification (it looks like it is).
> 
> POSIX _does_ mandate 'printf -- -%s x' to output exactly '-x', by virtue
> of the fact that it mandates all utilities (other than special builtins)
> with the specification 'OPTIONS None.' to parse and ignore '--' as the
> end of options, whether or not the utility takes options as an
> extension.  If NetBSD broke that behavior, that is a bug in NetBSD's
> shell, not bash nor POSIX.
[...]

Yes, thanks for pointing to the right section. I wish POSIX made
it clearer. Having it spread like that in different sections
with no full linking within them is suboptimal. The special
treatment of special builtins seems bogus to me as well (like
the fact that it seems it makes : "$x" unspecified unless $x can
be guaranteed not to start with -; the description of the colon
utility with its "expands its arguments" is bogus anyway).

-- 
Stephane



Re: Incorrect option processing in builtin printf(1)

2019-07-22 Thread Stephane Chazelas
2019-07-23 00:56:59 +0700, k...@munnari.oz.au:
[...]
>   POSIX specifies that printf(1) has no options, and by not
>   specifying that it is intended to comply with XBD 12.2 effectivly
>   says that it is not.   That is, in printf, the first arg is
>   always the format string, whatever it contains.
[...]

If that was the case, then that would be bug in the POSIX
specification. I can't find a single printf implementations
where printf -- outputs -- (I tried bash, zsh, ksh93, GNU, yash,
busybox, busybox ash, Solaris /bin/printf).

Even if POSIX didn't mandate

printf -- -%s x

to output -x, I'd say it would be a bug in the POSIX
specification (it looks like it is).

-- 
Stephane



Re: $? is -1

2019-07-05 Thread Stephane Chazelas
2019-07-05 09:03:06 -0400, Chet Ramey:
[...]
> > Second, when ran in a subshell, the same exit status gets mapped to 255.
> > While logical, as -1 gets mapped to 255, it seems inconsistent.
> > ( from the manual: "The return status is the exit status of list." )
> 
> It's the difference between passing a status around the shell and passing
> it through the waitpid() interface, which takes just eight bits.
[...]

Note though:

$ bash -c 'f() { return "$1"; }; f -1; echo "$?"'
255

It gets mapped to 255 even though there's no waitpid().

bash also takes upon itself to truncate the number passed to
exit(1) before passing it to _exit(2):

$ strace -fe exit_group bash -c 'exit -1'
exit_group(255) = ?

There's a lot of variation between shells (and other utilities)
on that front. See also:

https://unix.stackexchange.com/questions/418784/what-is-the-min-and-max-values-of-exit-codes-in-linux/418802#418802
https://unix.stackexchange.com/questions/99112/default-exit-code-when-process-is-terminated/99134#99134

-- 
Stephane



read -ed $'\r' messes up Enter key at the prompt

2019-04-26 Thread Stephane Chazelas
One can use:

IFS= read -i "$var" -red $'\r' var

In bash as the equivalent of zsh's

vared var


To edit the content of a variable (with the added restriction
that $var can't contain CR or NUL characters), using ^V^J to
embed newline characters.

But I find that after I run that command and return to the
prompt, pressing Enter inserts ^M instead of accepting the
current line. It seems it only happens with -d $'\r'

$ INPUTRC=/dev/null ./bash --norc
bash-5.0$ echo "$BASH_VERSION"
5.0.7(3)-maint
bash-5.0$ IFS= read -i "$var" -red $'\r' var
foo
bash-5.0$ echo "$var"^M^M^M^M^M

(those ^M resulting of me pressing Enter as many times. I can
accept the current line by pressing Ctrl+J).

That's on GNU/Linux amd64 with the current git head. Same in 4.4.19.

Another related issue with read -ed ''

bash-5.0$ IFS= read -rep 'prompt> ' -d '' var
prompt> asd
prompt> qwe
prompt>
bash-5.0$ echo "$var"
asdqwe

Entering ^@ doesn't end the "read" but instead reissues a prompt
for more until I enter ^@ on an empty input.

It may also be worth documenting that the argument to -d cannot
be a multi-byte character, or that only the first byte (not
character) of the argument is taken as delimiter.

-- 
Stephane



Re: "sh -a" sets the POSIXLY_CORRECT *environment* variable

2018-08-15 Thread Stephane Chazelas
2018-08-15 11:05:06 -0400, Chet Ramey:
> On 8/14/18 11:50 AM, Stephane Chazelas wrote:
> > Hi,
> > 
> > This is from
> > https://unix.stackexchange.com/questions/462333/why-does-a-in-bin-sh-a-affect-sed-and-set-a-doesnt
> > (original investigation by Mark Plotnick)
> > 
> > Though not documented, enabling the POSIX mode in bash whether
> > with
> > 
> > - bash -o posix
> > - sh
> > - env SHELLOPTS=posix bash
> > - set -o posix # within bash
> > 
> > causes it to set the value of the $POSIXLY_CORRECT shell
> > variable to "y" (if it was not already set)
> 
> Yes. This behavior dates from early 1997. It was put in on request so users
> could get a posix environment from the shell, since GNU utilities
> understand the POSIXLY_CORRECT variable. I could improve the documentation
> there, but a 20-plus-year-old feature isn't going to change.
[...]

Maybe there was a misunderstanding.

It's fine that bash enters POSIX mode when $POSIXLY_CORRECT is
set. IOW, it's fine that bash enters POSIX mode when the users
request it.

The problem I'm trying to raise is about the reverse behaviour:
that bash takes upon itself to request POSIX mode of all other
utilities when it itself enters POSIX mode, that it sets
$POSIXLY_CORRECT when it enters POSIX mode.

The problem would show up mostly with

#! /bin/sh -a

scripts on systems where sh is bash, and where the script relies
on non-POSIX behaviours of some GNU utilities.

I can't see how that could be seen as a feature, I can't imagine
anyone wanting that. If one wants to get a POSIX environment on
a GNU system, they would do:

export POSIXLY_CORRECT=y

(and yes, it's good that bash does honour it) (and yes, it's not
a very good interface as that means it can break scripts called
within that environment and that rely on non-POSIX behaviour of
some utilities, but that's beside the point being made here).

If one wants a POSIX shell, they can use

#! /bin/sh -

Or:

#! /usr/bin/env bash
set -o posix

if they can't rely on /bin/sh being a POSIX shell.

But that should not affect the behaviour of all other utilities
called within the script.

Without "-a", it's OK as the $POSIXLY_CORRECT variable is not
exported (it's not very useful that bash sets it (especially
considering it's not documented), but at least it's harmless).

-- 
Steohane



Re: Unexpected delay in using arguments.

2018-08-14 Thread Stephane Chazelas
2018-08-14 11:25:04 -0400, Chet Ramey:
[...]
> If you build a profiling version of bash, you'll find that about 75% of
> that time is spent copying the list of arguments around, since you have
> to save and restore it each time you call f1. Looking at making that more
> efficient has been a low-level task for a while now.
[...]

To save and restore that list of arguments, you only need to
save and restore one pointer. I don't see why you'd need to copy
the full list of pointers let alone the text of the arguments.

Note that it makes scripts using functions and that receive a
large number of arguments (think of scripts called as find .
-exec myscript {} + for instance) terribly inefficient.

find / -xdev -exec bash -c 'f(){ :;}; for i do f; done' bash {} +

(do nothing in a function for all the files in my root file
system) takes 4 seconds in dash and 9 minutes (135 times as
much) in bash.

-- 
Stephane



"sh -a" sets the POSIXLY_CORRECT *environment* variable

2018-08-14 Thread Stephane Chazelas
Hi,

This is from
https://unix.stackexchange.com/questions/462333/why-does-a-in-bin-sh-a-affect-sed-and-set-a-doesnt
(original investigation by Mark Plotnick)

Though not documented, enabling the POSIX mode in bash whether
with

- bash -o posix
- sh
- env SHELLOPTS=posix bash
- set -o posix # within bash

causes it to set the value of the $POSIXLY_CORRECT shell
variable to "y" (if it was not already set)

What is documented is that when that variable is set, then bash
enters the POSIX mode.

Now, that variable is understood as "enter POSIX mode" for many
GNU utilities and several non-GNU utilities (like those using
the GNU getopt API) not just bash.

When bash enters the POSIX mode while the allexport option is
on,  like in:

- sh -a
- sh -o allexport
- bash -o posix -o allexport...
- env SHELLOPTS=posix,allexport bash
- set -o posix # within bash -a

Then that POSIXLY_CORRECT ends up in the environment of all
commands started by that bash thereafter, which means they start
behaving differently.

IMO, bash's posix mode should only affect bash and its builtin
commands, not the other commands started within bash.

That a #! /bin/sh -a script should make all commands within
behave POSIXly is unexpected and means a script will behave
differently if /bin/sh is bashed on bash or another shell.

Maybe bash should not set POSIXLY_CORRECT, or not export it when
it sets it by itself. If it sets it, the documentation should
reflect it.

Maybe more generally, I think with allexport, bash should not
export the variables that it sets itself. One uses that option
for the variables they set to be available in other scripts
called within. It doesn't make sense to export things like
BASH_VERSION, BASHPID, OPTIND.

Per POSIX, "the export attribute shall be set for each variable
to which an assignment is performed (see Variable Assignment)",
though it goes on about getopts and read even though those are
not listed under "Variable Assignment".

In
set -a
getopts o VAR
printenv VAR

Some shells export VAR, few export OPTIND.

few export VAR in

set -a
for VAR in 1; do printenv VAR; done

(even though here I'd say it could be desirable).

All export it in "read VAR".

In any case, I think the POSIX specification should be
clarified.

-- 
Stephane



[minor] umask 400 causes here-{doc,string} failure

2018-03-11 Thread Stephane Chazelas
Note: sent to bash, zsh and Schily Bourne shell dev mailing
lists (not mksh as my email provider apparently doesn't play
well with mirbsd.org's expensive greylisting, please feel free
to forward there if you don't use gmail).

That's from:
https://unix.stackexchange.com/questions/429285/cannot-create-temp-file-for-here-document-permission-denied

$ bash -c 'umask 400; cat <<< test'
bash: cannot create temp file for here-document: Permission denied
$ zsh -c 'umask 400; cat <<< test'
zsh:1: can't create temp file for here document: permission denied
$ bosh -c 'umask 400; cat << EOF
test
EOF'
bosh: /tmp/sh193220: cannot open
$ mksh -c 'umask 400; cat <<< test'
mksh: can't open temporary file /tmp/sh933f2z.tmp: Permission denied

Those shells use temporary files to store the content of the
here-documents as the Bourne shell initially did, and open them
in read-only mode to make it cat's stdin.

When umask contains the 0400 bit, the file is created without
read permission to the user, hence the error upon that second
open().

(note that bosh also leaves the temp file behind in that
case).

I can think of several ways to address it:

1- do nothing and blame the user as the user explicitly asked
for files to be unreadable (but then again, it's not obvious
to the user that heredocs imply a temp file)

2- do like AT&T ksh/tcsh (or yash for big heredocs that don't
fit in the pipe buffer) and open the file only once for both
writing the content and making it the command's stdin (with a
lseek() to beginning in between). That means the fd ends up
being writable though I can't see it being a huge problem. (Yash
actually gives the file 000 permissions here regardless of the
umask with open("/tmp/yash-ECCFE6268", O_RDWR|O_CREAT|O_EXCL,
0), but see below about =(...) emulation)

3. do like dash/yash/rc/es and use a pipe instead of a temp
file. That means having to fork a process to feed the data (or
like yash fall back to a temp file for big heredocs). That also
means the fd is no longer seekable

The change could break some scripts for bash, as on Linux (where
/dev/fd/n behaves differently from other *nices), we see some
doing:

cmd1 /dev/fd/3 3<<< "$(cmd2)"

to emulate zsh's cmd1 =(cmd2) (command substitution using a temp
file). (A 0400 umask also makes a =(...) file unreadable, but
definitely here it's the user's problem).

4. Reset the umask temporarily to 077 before creating the temp
file (and block trapped signals until it's restored).


2 would have my preference.

-- 
Stephane



Re: some problems with scope of fds in process substitution

2017-12-04 Thread Stephane Chazelas
2017-12-04 08:46:24 -0500, Chet Ramey:
[...]
> Bash-4.4 allows you to wait for the last process substitution, since the
> pid appears in $!, like ksh93.

Thanks,

I hadn't noticed it had changed in 4.4

One major differnce with ksh93 though is that it won't work with

cmd | tee >(cmd2)

unless you enable lastpipe.

In:

cmd | tee >(cmd2) | cmd3

you won't get access to cmd2's pid in $! either in ksh93, but
those are usually OK as cmd2's stdout also goes to the pipe to
cmd3, so "waited for" by virtue of cmd3 waiting for it.

In any case, one can always work around it without lastpipe by doing:

cmd | (tee >(cmd2); ret=$?; wait "$!" || exit "$ret")



> There's still no option to wait for more
> than one, though I imagine I could make that work as part of `wait'
> without arguments.
[...]

Yes, that would be useful and align with ksh93.

That could however break some work flows like

exec > >(tee logfile) # not supported by ksh93

cmd1 & cmd2 &
wait


Or:

{
  cmd1 & cmd2 &
  wait
} > >(tee logfile)

-- 
Stephane



Re: some problems with scope of fds in process substitution

2017-12-03 Thread Stephane Chazelas
2017-12-03 17:31:00 -0500, Chet Ramey:
> On 12/1/17 2:00 PM, Stephane Chazelas wrote:
> 
> > Also, there's a lot of problems reported at
> > unix.stackexchange.com at least that are caused by bash not
> > waiting for the processes started by process substitutions,
> > especially the >(...) form.
> 
> Bash always reaps these processes. Do you mean waiting for them
> to terminate before executing the next command?
[...]

Hi Chet,

yes, that's what I meant as in:

$ rm -f a; bash -c 'echo test > >(cat > a); cat a'
$

In:

cmd1 <(cmd2)
cmd3

It's usually OK not to wait for cmd2, same as

cmd2 | cmd1
cmd3

where not all shells wait for cmd2 (before running cmd3).

as cmd1 will generally wait for cmd2 by virtue of it waiting on
eof on the pipe but in:

cmd1 >(cmd2)

It's more cmd2 that would be waiting for cmd1. So when cmd1
returns, often, cmd2 has not finished yet, and if cmd3 needs
something produced by cmd2, that's where we run into problems.

See the link I gave (https://unix.stackexchange.com/a/403788)
for more details including the situation with other shells that
implement process substitution (ksh, zsh, rc, es).

Cheers
Stephane



some problems with scope of fds in process substitution

2017-12-01 Thread Stephane Chazelas
FYI,

as seen at https://unix.stackexchange.com/a/408171, there are
still a few "problems" with process substitution, where some fds
are closed where they probably shouldn't:

> Note that even with the latest (4.4.12 as of writing) version, bash still has 
> a few bugs here like:
> 
> $ bash -c 'eval cat <(echo test)'
> test # OK but:
> $ bash -c 'eval "echo foo;cat" <(echo test)'
> foo
> cat: /dev/fd/63: No such file or directory
> $ bash -c 'eval f=<(echo test) "; cat \$f"'
> cat: /dev/fd/63: No such file or directory
> 
> and some still triggered by pipes like:
> 
> $ cat file
> echo "$1"
> cat "$1"
> $ bash -c 'source ./file <(echo test)'
> /dev/fd/63
> test  # OK but:
> $ cat file2
> echo "$1" | wc -c
> cat "$1"
> $ bash -c 'source ./file2 <(echo test)'
> 11
> cat: /dev/fd/63: No such file or directory

Also, there's a lot of problems reported at
unix.stackexchange.com at least that are caused by bash not
waiting for the processes started by process substitutions,
especially the >(...) form.

It would be more useful IMO if bash waited for them like zsh
does (under some conditions) or ksh93 can be told do.

More details at:

https://unix.stackexchange.com/a/403788

-- 
Stephane



command substitution inside parameter expansion inside "for ((;;))"

2017-11-15 Thread Stephane Chazelas
Hello,

$ bash -c 'for ((i = 0; $(echo 0); i++)); do echo x; done'

(OK)

$ bash -c 'for ((i = 0; ${x-`echo 0`}; i++)); do echo x; done'

(OK)

$ bash -c 'for ((i = 0; ${x-$(echo 0)}; i++)); do echo x; done'
bash: -c: line 0: syntax error near unexpected token `newline'
bash: -c: line 0: `for ((i = 0; ${x-$(echo 0)}; i++)); do echo x; done'

It's the same for ${x#$(echo 0)}.

I also noticed that cmd was not run in ${x#`cmd`} if x happens
to be empty (it's also the case in dash and ksh93 though only if
x is unset; zsh, mksh and yash are fine)

$ bash --version
GNU bash, version 4.4.12(1)-release (x86_64-pc-linux-gnu)

-- 
Stephane



Re: bug with case conversion of UTF-8 characters

2017-10-02 Thread Stephane Chazelas
2015-01-22 14:43:00 +, Stephane Chazelas:
[...]
> Bash Version: 4.3
> Patch Level: 30
> Release Status: release
> 
> (Debian unstable amd64)
> 
> $ LC_ALL=tr_TR.UTF-8 bash -c 'typeset -l a; a=İ; echo $a' | hd
>   69 b0 0a  |i..|
> 0003
[...]

Hi. While, that particular bug seems to have been fixed in 4.4,
it looks like there's still a problem in those Turkish locales
where uppercase i is İ and lowercase I is ı.

$ X=AEIOU LC_ALL=tr_TR.UTF-8 bash -c 'echo "${X,,}"'
aeIou
$ X=aeiou LC_ALL=tr_TR.UTF-8 bash -c 'echo "${X^^}"'
AEiOU

same issue with typeset -l/u

$ X=aeiou LC_ALL=tr_TR.UTF-8 awk 'BEGIN{print toupper(ENVIRON["X"])}'
AEİOU
$ X=AEIOU LC_ALL=tr_TR.UTF-8 awk 'BEGIN{print tolower(ENVIRON["X"])}'
aeıou

Those ones are OK:

$ X=AEİOU LC_ALL=tr_TR.UTF-8 bash -c 'echo "${X,,}"'
aeiou
$ X=aeıou LC_ALL=tr_TR.UTF-8 bash -c 'echo "${X^^}"'
AEIOU

nocasematch seems to be OK as well.

$ bash --version
GNU bash, version 4.4.12(1)-release (x86_64-pc-linux-gnu)

(on Debian).

-- 
Stephane



Re: printf %d $'"\xff' returns random values in UTF-8

2017-09-17 Thread Stephane Chazelas
2017-09-17 11:01:00 +0100, Stephane Chazelas:
[...]
>wchar_t wc;
> -  size_t mblength, slen;
> +  int mblength;
[...]
> +  mblength = mbtowc (&wc, garglist->word->word+1, slen);
> +  if (mblength > 0)
> +ch = wc;
[...]

Actually, "wc" should probably be initialised to 0 to cover for
cases where the string only contains state switching sequences
in stateful encodings (in which case, mbtowc may return their
length but not set "wc" as there's no character in there). (I've
not tested it and anyway sane systems  would not have locales
with such charsets so it's mostly an academic consideration).

So:


diff --git a/builtins/printf.def b/builtins/printf.def
index 3d374ff..7a840bb 100644
--- a/builtins/printf.def
+++ b/builtins/printf.def
@@ -1244,19 +1244,17 @@ asciicode ()
 {
   register intmax_t ch;
 #if defined (HANDLE_MULTIBYTE)
-  wchar_t wc;
-  size_t mblength, slen;
+  wchar_t wc = 0;
+  int mblength;
+  size_t slen;
 #endif
   DECLARE_MBSTATE;
 
 #if defined (HANDLE_MULTIBYTE)
   slen = strlen (garglist->word->word+1);
-  mblength = MBLEN (garglist->word->word+1, slen);
-  if (mblength > 1)
-{
-  mblength = mbtowc (&wc, garglist->word->word+1, slen);
-  ch = wc; /* XXX */
-}
+  mblength = mbtowc (&wc, garglist->word->word+1, slen);
+  if (mblength > 0)
+ch = wc;
   else
 #endif
 ch = (unsigned char)garglist->word->word[1];


-- 
Stephane



printf %d $'"\xff' returns random values in UTF-8 (Was: printf %d $'"\xff' returns random values in UTF-8 and 0 in C locale)

2017-09-17 Thread Stephane Chazelas
Sorry, subject was wrong. The behaviour is OK in the C locale.

-- 
Stephane



printf %d $'"\xff' returns random values in UTF-8 and 0 in C locale

2017-09-17 Thread Stephane Chazelas
$ locale charmap
UTF-8
$ bash -c '"$@"' sh printf '%d\n' $'"\xff' $'"\xff' $'"\xff'
32767
0
0

That's because we store the return value of mblen() (which may be
-1) into a size_t (unsigned) variable.

See patch below which aligns the behaviour with that of other
shells which use the byte value when the initial sequence of
bytes can't be converted to a character.

So:

printf '%d\n' $'"\uff' $'"\xff'

outputs

255
255

The call to mblen() has been removed. It's wrong to use it here
as it would return -1 on a string like "ábc\x80" in UTF-8, so
would end up getting the value for the first byte instead of the
codepoint of the first character.

diff --git a/builtins/printf.def b/builtins/printf.def
index 3d374ff..67e5b59 100644
--- a/builtins/printf.def
+++ b/builtins/printf.def
@@ -1245,18 +1245,16 @@ asciicode ()
   register intmax_t ch;
 #if defined (HANDLE_MULTIBYTE)
   wchar_t wc;
-  size_t mblength, slen;
+  int mblength;
+  size_t slen;
 #endif
   DECLARE_MBSTATE;
 
 #if defined (HANDLE_MULTIBYTE)
   slen = strlen (garglist->word->word+1);
-  mblength = MBLEN (garglist->word->word+1, slen);
-  if (mblength > 1)
-{
-  mblength = mbtowc (&wc, garglist->word->word+1, slen);
-  ch = wc; /* XXX */
-}
+  mblength = mbtowc (&wc, garglist->word->word+1, slen);
+  if (mblength > 0)
+ch = wc;
   else
 #endif
 ch = (unsigned char)garglist->word->word[1];
diff --git a/support/bashbug.sh b/support/bashbug.sh
index 29ce134..01db35d 100644

-- 
Stephane



Re: "$@" not available in rcfile, $ENV, $BASH_ENV...

2017-09-10 Thread Stephane Chazelas
2017-09-10 14:56:50 -0400, Chet Ramey:
> On 9/10/17 11:11 AM, Stephane Chazelas wrote:
> > When running bash as:
> > 
> > bash -s foo bar
> > 
> > "$@" is not available inside .bashrc. Same for $ENV (POSIX
> > conformance issue?), $BASH_ENV, or ~/.bash_profile (with bash
> > --login -s).
> > 
> > In the case of bash -c, that also affects $0.
> 
> Bash has always behaved like this, and it's not a Posix conformance
> issue.  Is there a compelling reason to change it other than compatibility
> with other shells?
[...]

I would say it would also be more useful and the behaviour less
surprising, and I would think aligning with other shells would
be very unlikely to break backward compatibility as I can't
think of any reason why anyone would rely on the current
behaviour.

In any case, I can't see anything in the POSIX spec allowing
the bash behaviour. printf '%s\n' "$1" should output the content
of the first positional parameter, there's no specific provision
for that not to be the case when that's done while interpreting
the content of $ENV.

It came up today when somebody was looking for some way to be
able to have the user interact with a shell interpreting a
script midway through the script. I would have expected the
script below to work:

#! /bin/sh -
[ "$0" = bash ] || exec bash --rcfile "$0" -s "$@" || exit 1

trap 'after "$@"' EXIT

before() {
  echo something to do before $*
}
after() {
  echo something to do after $*
}

before "$@"

if [ -f ~/.bashrc ]; then
  . ~/.bashrc
fi

-- 
Stephane




"$@" not available in rcfile, $ENV, $BASH_ENV...

2017-09-10 Thread Stephane Chazelas
When running bash as:

bash -s foo bar

"$@" is not available inside .bashrc. Same for $ENV (POSIX
conformance issue?), $BASH_ENV, or ~/.bash_profile (with bash
--login -s).

In the case of bash -c, that also affects $0.

ksh88, ksh93, mksh, dash, zsh, posh, busybox sh can access $@ in
$ENV

Reproduce it with:

$ echo 'echo "$0" "$#" "$@"' > rc
$ (ENV=rc exec -a sh bash -s foo bar)
sh 0
sh-4.4$ exit
exit
$ bash --rcfile rc -s foo bar
bash 0
bash-4.4$ exit
$ BASH_ENV=rc bash  -c : arg0 foo bar
bash 0

In some cases, one could work around it by using:
PROMPT_COMMAND='. ./rc; unset PROMPT_COMMAND' bash -s foo bar
Or pass the argument via some other mean like encoded in an
environment variable.

-- 
Stephane



read -N and UTF-8 characters

2017-04-06 Thread Stephane Chazelas
Using "read -N" on text containing multi-byte characters
produces incorrect result.

On Debian amd64 with the latest code from git:

$ locale charmap
UTF-8
$ printf '\ue9VWXYZ' | ./bash -c 'IFS= read -rN4 a; printf %s "$a"' | hd
  c3 58 a9 56 57|.X.VW|
0005

(expected c3 a9 56 57 58)

$ ./bash --version
GNU bash, version 4.4.12(6)-maint (x86_64-unknown-linux-gnu)
[...]

It seems to be a regression (since 4.3). It was working OK with 4.2

-- 
Stephane



Re: "unset var" pops var off variable stack instead of unsetting it

2017-03-21 Thread Stephane Chazelas
2017-03-20 16:32:10 -0400, Chet Ramey:
[...]
> > See also:
> > 
> > $ bash -c 'f() { unset a; echo "$a";}; a=1; a=2 f'
> > 1
> > 
> > already mentioned.
> 
> A distinction without a difference; the behavior is explicitly the same.
[...]

One I haven't mentioned yet is:

$ bash -c 'f() { local a; unset a; echo "$a";}; a=1; a=2 f'
1

IOW, the work around I was mentioning earlier (of using "local"
before "unset" to make sure "unset" unsets) doesn't work in that
case. You'd need to use the same work around as for mksh/yash
(call unset in a loop until the variable is really unset (with
the nasty side effect of unsetting the variable in a scope
you're need meant to tamper with) so you'd want to do it in a
subshell).

-- 
Stephane



Re: "unset var" pops var off variable stack instead of unsetting it

2017-03-20 Thread Stephane Chazelas
2017-03-20 14:47:17 -0400, Chet Ramey:
> On 3/20/17 2:30 PM, Eric Blake wrote:
> > On 03/17/2017 07:21 PM, Stephane Chazelas wrote:
> > 
> >>> The problem is the non-obvious nature of unset's interaction with scope,
> >>
> >> the main problem to me is an unset command that doesn't unset.
> >>
> >> As shown in my original post, there's also a POSIX conformance
> >> issue.
> > 
> > As POSIX has not yet specified 'local', any use of 'local' already
> > renders the script non-conformant, so it shouldn't matter what bash does
> > in that situation (although if POSIX is ever going to standardize
> > 'local', it requires some concerted effort to make all shells with
> > 'local' to settle on a lowest common denominator).
> 
> I believe he means the behavior of `a=0; a=1 eval unset a', which Posix
> implicitly requires affect the global scope and results in a being unset
> when the statement completes.
[...]

See also:

$ bash -c 'f() { unset a; echo "$a";}; a=1; a=2 f'
1

already mentioned.

In any case, those are corner cases I'm not too worried about.
I'm more concerned about "unset var" not unsetting var in real
life cases when "local"/"typeset" is being used (regardless of
POSIX).

The other POSIX related concern I was also mentioning is the
fact that the work around implies using non-POSIX constructs,
the fact that libraries of POSIX functions can't be used in
bash/mksh/yash.

The particular case that affects me directly, is that
recommendations I'm giving online about POSIX compatible
constructs such as:

(unset IFS; set -f; cmd -- $var)

Which is (or at least I though should be) *the* Bourne idiom to
split, already mentioned several times are *wrong* for bash (or
mksh or yash) in the general case, as "unset" doesn't do what it
says on the can there.

Another case of "principle of least astonishment" not being
followed IMO.

-- 
Stephane








Re: "unset var" pops var off variable stack instead of unsetting it

2017-03-20 Thread Stephane Chazelas
2017-03-20 08:04:33 -0400, Greg Wooledge:
[...]
> > Credits to Dan Douglas
> > (https://www.mail-archive.com/miros-mksh@mirbsd.org/msg00707.html)
> > for finding the bug. He did find a use for it though (get the
> > value of a variable from the caller's scope).
> 
> Isn't this exactly the same as the "upvar" trick that Freddy Vulto
> discovered?  http://www.fvue.nl/wiki/Bash:_Passing_variables_by_reference

Hi Greg,

Yes, I hadn't realised initially that the issue had already been
discussed before (and not fixed). You'll find that that "upvar"
and the link above  has since been mentioned in this thread
(see also https://www.mail-archive.com/bug-bash@gnu.org/msg19445.html)

At this point, I have little hope that bash will be fixed. But
mksh/oksh and yash still might.

-- 
Stephane



Re: "unset var" pops var off variable stack instead of unsetting it

2017-03-20 Thread Stephane Chazelas
2017-03-20 12:30:09 +0900, 渡邊裕貴:
> It seems to me this matter is specific to the IFS variable (and possibly
> few others like CDPATH). Unsetting IFS restores the default field splitting
> behavior, but unsetting PATH does not restore the default command search
> path. As Peter suggests, you can locally re-define any variables you
> want and that should work in any situation.
[...]

Hi Yuki,

you seem to be concurring with me that unset is broken and that
the work around is to not use it.

Note that unsetting PATH generally *does* restore a default
command search path. However, on many systems, not everything
agrees on the default search path (for instance on my Debian
system, for execvp(), it's ":/bin:/usr/bin" (yes, current
directory first!), for bash and dash it seems to be only the
current directory (as if PATH was set to the empty string), for
yash it seems it's nothing, for mksh "/usr/bin:/bin", for ksh93
"/bin:/usr/bin"... behaviour left "implementation defined" by
POSIX) so unsetting PATH is not useful.

Now, there are many reasons one may want to use unset.

For instance, unsetting LC_* restores LANG, one may want to
unset LD_LIBRARY_PATH, GCONV_PATH, LOCPATH, PERL5LIB,
PYTHON_PATH... for security reasons or to get a consistent
behaviour. In POSIX sh language, unsetting a variable is the
only way to unexport it. Same for changing the type of a
variable to scalar in bash without declaring it local.
zsh/yash/mksh have "typeset -g" for that, but in bash typeset -g
affects the variable in the global scope instead of preventing
the restricting the scope in other shells.

unset is also commonly used to make sure variables /have a
default value of / like in things like:

rmv() (
  unset OPTIND force interactive verbose
  while getopts :ivf o; do
(f) force=1;;
...
  esac
  shift "$((OPTIND - 1))"
  exec rm ... ${force+"-f"} "$@"
)

Replacing the "unset force" with "force=" (and use
${force:+"-f"}) to work around the unset bug would not be enough
with bash/mksh as $force might have been defined as "integer" in
a parent scope. So one would need to use "typeset force=", or
just "typeset foo" which declares it with an  value,
but that would make it no longer standard sh code (so that
function can no longer be used in sh scripts).

-- 
Stephane



Re: "unset var" pops var off variable stack instead of unsetting it

2017-03-19 Thread Stephane Chazelas
2017-03-19 18:05:19 -0400, Chet Ramey:
> On 3/19/17 5:51 PM, Stephane Chazelas wrote:
> 
> > On comp.unix.shell ot http://unix.stackexchange.com, I've posted
> > many articles describing how to do splitting in POSIX-like
> > shells:
> > 
> > ( # subshell for local scope
> >   unset -v  IFS # restore default splitting behaviour
> >   set -o noglob # disable globbing
> >   cmd -- $var   # split+glob with default IFS and glob disabled
> > )
> > 
> > I'm now considering adding a note along the lines of:
> > 
> >   "Beware that with current versions of bash, pdksh and yash,
> >   the above may not work if used in scripts that otherwise use
> >   typeset/declare/local on $IFS or call a function with
> >   `IFS=... my-function' (or IFS=... eval... or IFS=...
> >   source...)"
> 
> You can, of course, do whatever you want.  You might want to read my
> message from yesterday about what happens when you do that, or look
> at the following examples, after which you may decide that the situation
> is not as dire.
> 
> $ cat x2
> function foo
> {
> (
>   unset -v IFS
>   recho "${IFS-unset}"
> )
> }
> 
> IFS=':|'
> foo
> echo after IFS = "$IFS"
> $ ../bash-4.4-patched/bash ./x2
> argv[1] = 
> after IFS = :|

Yes, that one is  fine but it is not the issue that is being
discussed here. There's no variable to pop off a stack above.

the issue is when that "foo" function is called in a context
where IFS had been declared locally. Like in:

IFS=1
function example {
  typeset IFS=2
  foo
}

Where "foo" would output "1", because then "unset -v IFS" would
*not* have unset IFS but instead would have restored the value
it had before the "typeset" (in that case, the global scope).

-- 
Stephane



Re: "unset var" pops var off variable stack instead of unsetting it

2017-03-19 Thread Stephane Chazelas
2017-03-18 13:16:56 -0400, Chet Ramey:
> On 3/17/17 5:51 PM, Stephane Chazelas wrote:
> 
> > Now, if that "split" functions is called from within a function
> > that declares $IFS local like:
>   [...]
> > because after the "unset IFS", $IFS is not unset (which would
> > result in the default splitting behaviour) but set to ":" as it
> > was before "bar" ran "local IFS=."
[...]

For bash, it looks like the boat has sailed as the issue has
been discussed before, but let me at least offer my opinion, and
also add the maintainer of yash and mksh in Cc so they can
comment as they have similar issues in their shell which they
may want to address (at least in the documentation). It's even
worse for mksh and yash as it's harder to work around there.

  For Yuki and Thorsten, see the start of the discussion at 
  https://www.mail-archive.com/bug-bash@gnu.org/msg19431.html
  (and
  https://www.mail-archive.com/miros-mksh@mirbsd.org/msg00697.html
  before that)

  In short, the issue is that "unset var" does not always leave
  $var unset (contrary to what the documentation or the name of
  the command suggest) but may instead restore a previous value.
  Reproducer for bash/pdksh/yash:

  $ f()(unset a; echo "$a"); g() { typeset a=2; f; }; a=1; g
  1

  For bash, also:

  $ f()(unset a; echo "$a"); a=1; a=2 f
  1

  One work around for bash is:

  f()(local a; unset a; echo "$a");  g() { typeset a=2; f; }; a=1; g

  as long as we want "$a" to be unset only for the local
  function (already enforced by the subshell in this case) or with
  bash/mksh/yash:

  f()(while [ "${a+set}" ]; do unset a; done
echo "$a");  g() { typeset a=2; f; }; a=1; g

  (again, not likely to do what we want when not called in a
  subshell)

  (see also 
  f()(unset "a[0]"; echo "$a"); g() { typeset a=2; f; }; a=1; g
  in pdksh that doesn't unset "$a" but makes it an array with no
  element).
> 
> This is how local variables work.  Setting a local variable shadows
> any global with the same name; unsetting a local variable reveals the
> global variable (or really, because bash has dynamic scoping, the value
> at a previous scope) since there is no longer a local variable to shadow
> it.
[...]

Chet, the behaviour you describe above would be that of a "popvar"
(not "unset") command, an arcane command for an arcane feature:
pop a variable off a stack to restore the value (and attributes)
it had in an outer scope. A feature I would probable never need
in a million years. The only known usage of it being that hack
(http://www.fvue.nl/wiki/Bash:_Passing_variables_by_reference)
to be able to return a value into a variable passed as argument
to a function while still being able to use a local variable
with the same name in the function.

There is no way any sane person would write

   unset IFS

and mean anything else than unsetting the IFS variable (make
sure $IFS is not set afterwards so word splitting revers to the
default).

There's no way any sane person would expect that to mean
"restore the variable from an outer scope I don't known about"
(yash/pdksh) or in the case of bash: "restore the variable from
an outer scope unless I've declared it in the current function
context".

unsetting variables is an essential feature in shells as many
variables especially those in the environment have a special
meaning, affect the environment  when set. 

I can't imagine it being anything other than an unintended
accident of implementation, certainly not an intentional feature
of the language (at least not initially).

In all other languages that have a "unset"/"undef"/"delete"
similar feature (tcl, perl, php, ksh88 (dynamic scoping), ksh93
(static scoping), zsh, dash at least), unset unsets the variable
in the innest scope it has been declared in. I don't know of any
language that has a "popvar" feature to allow the user to unravel
the variable stack behind the back of the interpreter.

Several languages with static scoping (tcl with upvar, ksh93
with "typeset -n", python3 with nonlocal at least) have a way to
access variables in a parent scope, but with dynamic scoping,
there's no need for that. Child functions already have access to
the parent variables.

The issue (that there's no notion of variable reference in
those shells) that
http://www.fvue.nl/wiki/Bash:_Passing_variables_by_reference
tries to hack around is better addressed IMO with namespacing
(like return the value in a dedicated variable (REPLY for
instance is already used for that internally in bash and several
other shells) or make sure utility functions that modify
arbitrary variables use a dedicated prefix for their own
variables))

In an

Re: "unset var" pops var off variable stack instead of unsetting it

2017-03-17 Thread Stephane Chazelas
2017-03-17 17:35:36 -0500, Dan Douglas:
> The need to localize IFS is pretty obvious to me - of course that's
> given prior knowledge of how it works.
[...]

I don't expect the need to have to add "local var" in

(
   unset -v var
   echo "${var-OK}"
)

would be obvious to many people beside you though.

People writing function libraries meant to be used by several
POSIX-like shells need to change their code to:

split() (
  [ -z "$BASH_VERSION" ] || local IFS # WA for bash bug^Wmisfeature
  unset -v IFS
  set -f 
  split+glob $1
)

if they want them to be reliable in bash.

> The problem is the non-obvious nature of unset's interaction with scope,

the main problem to me is an unset command that doesn't unset.

As shown in my original post, there's also a POSIX conformance
issue.

> (and the lack of documentation). Not much can be done about the former,
> as it is with so many things.

So what should the documentation be? With my "eval" case in
mind, it's hard to explain without getting down to how stacking
variables work. Maybe something like:

after unset -v var
  - if var had been declared (without -g) in the current
function scope (not the global scope), $var becomes unset in
the current scope (not in parent scopes). Futher unset
attempts will not affect the variable in parent scopes.
  - otherwise, the previous var value (and type and attributes)
is popped from a stack. That stack is pushed every time the
variable is declared without -g in a new function scope or
when the "." or "eval" special builtins are invoked as var=x
eval 'code' or var=x . sourced-file. If the stack was empty,
the variable is unset.

There's also missing documentation for:

unset -v 'var[x]' (note the need to quote that glob)
  can only be used if "var" is an array or hash variable and unsets
  the array/hash element of key x. Unsetting the last element
  does not unset the variable. For arrays, negative subscripts
  are relative to the greatest assigned subscript in the array.
  unset "a[-1]" "a[-1]" unsets the 2 elements with the greatest
  subscript, but that's not necessarily the case for unset
  "a[-2]" "a[-1]" if the array was sparse.

  unset "var[@]" or unset "var[*]" can be used to unset all the
  elements at once. For associative arrays, use unset 'a[\*]' or
  unset 'a[\@]' to unset the elements of key * and @. It is not
  possible [AFAICT] to unset the element of key "]" or where the
  key consists only of backslash characters [btw, it also looks
  like bash hashes (contrary to zsh or ksh93 ones) can't have an
  element with an empty key]

It is not an error to attempt to "unset" a variable or array
element that is not set, except when using negative subscripts.
  
Also, the doc says:

>  The -v flag specifies that NAME refers to parameters.
>  This is the default behaviour.

It might be worth pointing out that "unset -v", contrary to the
default behaviour, won't unset functions so it's a good idea to
use "unset -v" instead of "unset" if one can't guarantee that
the variable was set beforehand (like the common case of using
unset to remove a variable which was potentially imported from
the environment).

-- 
Stephane



"unset var" pops var off variable stack instead of unsetting it

2017-03-17 Thread Stephane Chazelas
Hi,

consider this function:

split() (
  unset -v IFS  # default splitting
  set -o noglob # disable glob

  set -- $1 # split+(no)glob
  [ "$#" -eq 0 ] || printf '<%s>\n' "$@"
)

Note the subshell above for the local scope for $IFS and for
the noglob option. That's a common idiom in POSIX shells when
you want to split something: subshell, set IFS, disable glob,
use the split+glob operator.

split 'foo * bar'

outputs


<*>


as expected. So far so good.

Now, if that "split" functions is called from within a function
that declares $IFS local like:

bar() {
  local IFS=.
  split $1
}

Then, the "unset", instead of unsetting IFS, actually pops a
layer off the stack.

For instance

foo() {
  local IFS=:
  bar $1
}

foo 'a b.c:d'

outputs



instead of




because after the "unset IFS", $IFS is not unset (which would
result in the default splitting behaviour) but set to ":" as it
was before "bar" ran "local IFS=."

A simpler reproducer:

$ bash -c 'f()(unset a; echo "$a"); g(){ local a=1; f;}; a=0; g'
0

Or even with POSIX syntax:

$ bash -c 'f()(unset a; echo "$a"); a=0; a=1 eval f'
0

A work around is to change the "split" function to:

split() (
  local IFS
  unset -v IFS  # default splitting
  set -o noglob # disable glob

  set -- $1 # split+(no)glob
  [ "$#" -eq 0 ] || printf '<%s>\n' "$@"
)

For some reason, in that case (when "local" and "unset" are
called in the same function context), unset does unset the
variable.

Credits to Dan Douglas
(https://www.mail-archive.com/miros-mksh@mirbsd.org/msg00707.html)
for finding the bug. He did find a use for it though (get the
value of a variable from the caller's scope).

-- 
Stephane



Issues in option handling (Was: break no longer breaks out of loops defined in an outer context)

2017-03-07 Thread Stephane Chazelas
2017-03-03 08:27:03 -0500, Chet Ramey:
> On 3/1/17 4:58 PM, Stephane Chazelas wrote:
> 
> > BTW, there seems to have been a regression in the handling of the -O
> > option to the bash interpreter:
> 
> Thanks, I'll take a look.  It looks to be specific to the various
> compat* options.
[...]

See also:

$ POSIXLY_CORRECT=1 bash -c 'set +o posix; set +o' | grep posix
set +o posix
$ POSIXLY_CORRECT=1 bash +o posix -c 'set +o' | grep posix
set -o posix

(not a regression, already in 4.3)

I'd expect explicit command line options to take precedence over
the environment.

Another oddity:

$ env SHELLOPTS= bash -o posix -c :
$ env SHELLOPTS= bash +o posix -c :
bash: SHELLOPTS: readonly variable

-- 
Stephane



Re: break no longer breaks out of loops defined in an outer context

2017-03-01 Thread Stephane Chazelas
2017-03-01 09:49:52 -0500, Chet Ramey:
[...]
> > Would you recommend people start adding:
> > 
> > shopt -s compat44 2> /dev/null || true
> > 
> > at the start of their script that they develop for bash-4.4 now
> > so that it still works even when bash-6.0 makes a non-backward
> > compatible change?
> 
> I know this isn't a serious question, but I'll answer it anyway.
> No. People should look at the effects of each compatibiity version
> option and decide for themselves.
> 
> > 
> > It seems there's a compatXX option for each of the versions
> > since bash31. Will you keep doing it for every version?
> 
> Most likely, but I will probably phase out the shopt options in
> favor of BASH_COMPAT.
[...]

Thanks,

I wasn't aware of BASH_COMPAT. So instead of

shopt -s compat44 2> /dev/null || true

I could use:

BASH_COMPAT=4.4

except that it gives a
bash: BASH_COMPAT: 4.4: compatibility value out of range

If run from an older version of bash (note that it does not set
$? to non-zero nor triggers "set -e").

It seems I would want to specify the lowest version of bash I
want to support in the script. Which makes sense.

BTW, there seems to have been a regression in the handling of the -O
option to the bash interpreter:

$ ./bash -O compat31 -c 'echo "$BASH_VERSION"; [[ a =~ "." ]] && echo yes'
4.4.12(4)-maint
$ ./bash  -c 'echo "$BASH_VERSION"; shopt -s compat31; [[ a =~ "." ]] && echo 
yes'
4.4.12(4)-maint
yes

Was OK with 4.3:

$ bash -O compat31 -c 'echo "$BASH_VERSION"; [[ a =~ "." ]] && echo yes'
4.3.46(1)-release
yes


(affects at least compat43 as well, not failglob for instance).

-- 
Stephane



Re: break no longer breaks out of loops defined in an outer context

2017-03-01 Thread Stephane Chazelas
2017-02-28 16:18:05 -0500, Chet Ramey:
[...]
> > Just my personal opinion, but I think I'd rather the spec had
> > been updated to accomodate the bash (and many other shells)
> > behaviour rather than bash breaking its backward compatibility
> > to comply to a requirement that is not particularly useful
> > myself.
> 
> That's why I made the bash-4.3 behavior available via the shell
> compatibility level mechanism.
[...]

That's good to have at least a mechanism to make sure scripts
are not going to be broken by a future, incompatible version of
the shell.

Would you recommend people start adding:

shopt -s compat44 2> /dev/null || true

at the start of their script that they develop for bash-4.4 now
so that it still works even when bash-6.0 makes a non-backward
compatible change?

It seems there's a compatXX option for each of the versions
since bash31. Will you keep doing it for every version?

Maybe a shopt -s compat=44 to avoid filling up the option
namespace would be better. It's not clear what combining several
compat options could do. For instance with shopt -s compat31
compat43 would not be compatible with bash-4.3 in that the [[ =~
]] operator would work the bash-3.1 way. That could cause
confusion.

(or #! /bin/bash -o compat=4.4 or --compat=4.4 to link to a
parallel thread here).
-- 
Stephane



Re: break no longer breaks out of loops defined in an outer context (was: Bug????)

2017-02-28 Thread Stephane Chazelas
2017-02-28 19:43:11 +0100, tarot:
> Gr! it is not a bug!!!
> 
> 
> xx. Fixed a bug that could allow `break' or `continue' executed from shell
> functions to affect loops running outside of the function.
> 
> My BIG script doesn't work with bash-4.4
> 

There was a related discussion on the Austin group mailing list
back in May last year:

http://permalink.gmane.org/gmane.comp.standards.posix.austin.general/12614

Just my personal opinion, but I think I'd rather the spec had
been updated to accomodate the bash (and many other shells)
behaviour rather than bash breaking its backward compatibility
to comply to a requirement that is not particularly useful
myself.

You'll find more at that thread, though with the gmane web
interface not fully back up, it's hard to get the whole thread
(you can increase that 12614 number above).

-- 
Stephane




Re: Why two separate option namespaces?

2017-02-28 Thread Stephane Chazelas
2017-02-27 16:18:46 -0500, Chet Ramey:
> On 2/27/17 11:50 AM, Martijn Dekker wrote:
> 
> > So basically you're saying that, for options without a single-letter
> > equivalent, "-o" options are those that are either POSIX or that you
> > think should be POSIX? But then that distinction is more political than
> > technical, isn't it?
> 
> Heh. Let's just say that I'm leaving the `set -o' namespace to Posix.
[...]

Well, not really since bash already has extensions over the
POSIX ones.

Note that all of ksh, yash and zsh at least have extensions of
their own there.

bash seems to be the last actively maintained POSIX-certified
shell, so I don't expect POSIX would come up with option names
that would conflict with bash's, so from that point of view, it
would be safe to merge the name spaces.

It may be worth checking other shell implementations to see if
there are potential conflicts. Uniformizing between all shells
would help writing scripts portable across several of the modern
Bourne-like shells.

zsh and yash are being conciliatory there in that they ignore
case, underscore (even hyphen for yash) and treat a "no" prefix
as inverting the setting.

yash and zsh also accept -- to "set" and to the
interpreter (in the case of yash, a la GNU, that is accepting
unambiguous abbreviations like set --posix for set
--posixlycorrect).

bash also accepts -- (but does not support GNU-like
abbreviations) for a restricted set of options like
verbose/posix. See also bash --restricted vs bash -O
restricted_shell (the latter one being ineffective).
(that's another set of "options" that may be worth merging).

So bash's shopt -s nocasematch also works as set -o nocasematch
in those shells (in addition to set +o CASE_MATCH for instance).
and bash's set -o posix also works (though with different
effect even if the intention is the same) in zsh and yash (even
if posix is not a valid option name in yash).

There are things like bash.extglob vs zsh.kshglob (zsh also has
zsh.extendedglob for its own extended globs)

-- 
Stephane




Re: echo -n

2017-02-06 Thread Stephane Chazelas
2017-02-06 09:45:26 +0530, Jyoti B Tenginakai:
[...] 
> Again I see that this printf we can use. But there are some scenarios where
> the o/p does not exactly match with echo. So  still its good to have a way
> to pirnt -n /-e/-E with echo. Can this be considered as bug and can this be
> fixed?
[...]

echo -e '\055n' # on ASCII-based systems

echo -e '-n\n\c'

echo -ne '-n\n'

Output "-n" with bash's echo and with the default options.

When in Unix conformance mode (shopt -s xpg_echo; set -o posix)

echo -n

outputs -n followed by a newline character, while

echo '-n\c'

outputs it without the newline character.

All the functionality of echo is available in printf. The %b
format directive has especially been introduced (by POSIX) to
cover for the peculiar style of expansion done by echo (note the
need for \055 instead of \55 like everywhere else) so we have no
excuse to keep using echo.

echo_E() {
  local IFS=' '
  printf '%s\n' "$*"
}

echo_e() {
  local IFS=' '
  printf '%b\n' "$*"
}

echo_nE() {
  local IFS=' '
  printf %s "$*"
}

echo_ne() {
  local IFS=' '
  printf %b "$*"
}

echo is broken. Stop using it.

ksh and zsh introduced a "print" builtin that fixes some of echo
problems (though it still does expansions by default which you
can disable with -r), but bash chose not to implement it, but
does implement the POSIX (though less Unixy in its CLI style)
printf instead.

-- 
Stephane





Re: echo -n

2017-02-02 Thread Stephane Chazelas
2017-02-02 22:26:22 +0530, Jyoti B Tenginakai:
[...]
> I have tried using the printf instead of echo. But the issue with printf
> is , the behaviour is not consistent with what echo prints for all the
> inputs i.e.
> In my script I am generically using echo for all the options. If I have to
> use printf instead of it should behave consistently .
> if echo * is passed to bash shell, the o/p shows the \t seperated values
> whereas with printf '%s'  *, it won't display space separated output. Again
> printf '%s ' # behaviour is different from what echo # shows
[...]

See also:

https://unix.stackexchange.com/questions/65803/why-is-printf-better-than-echo


In bash, you can define:

puts() {
  local IFS=' '
  printf '%s\n' "$*"
}

as a function that outputs its arguments separated by spaces and
terminated with a newline character.

POSIXly:

puts() (
  IFS=' '
  printf '%s\n' "$*"
)

With with some shells like bash, that implies an additional
fork.

Note hat ksh and zsh also have:

print -r -- *

for that.

-- 
Stephane




Re: incorrect lseek() when processing script ending in unterminated line

2016-12-21 Thread Stephane Chazelas
2016-12-21 14:35:44 +, Stephane Chazelas:
[...]
> That sync_buffered_stream is meant to seek back to where we're
> meant to resume reading the script when we've read more than
> needed, but here b_inputp > b_used would suggest we've processed
> code that is passed what we've read. Or more likely b_used has
> incorrectly been set to 0.
[...]

Looks like it should only be a matter of:

diff --git a/input.c b/input.c
index 308b87e4..a03911d3 100644
--- a/input.c
+++ b/input.c
@@ -504,7 +504,7 @@ b_fill_buffer (bp)
 nr = zread (bp->b_fd, bp->b_buffer, bp->b_size);
   if (nr <= 0)
 {
-  bp->b_used = 0;
+  bp->b_used = bp->b_inputp = 0;
   bp->b_buffer[0] = 0;
   if (nr == 0)
bp->b_flag |= B_EOF;

Though I've not done any testing other than it fixes that
particular problem.

-- 
Stephane



incorrect lseek() when processing script ending in unterminated line

2016-12-21 Thread Stephane Chazelas
Hello.

That was discovered at
http://unix.stackexchange.com/a/331884

Consider this script that modifies itself (and happens not to
end in a newline character):

$ printf %s 'printf "\necho %s\n" {1..10} >> $0' > script.sh; bash -x 
./script.sh
+ printf '\necho %s\n' 1 2 3 4 5 6 7 8 9 10
+ echo 1
1
+ echo 2
2
+ echo 3
3
+ echo 4
4
+ echo 5
5
+ echo 6
6
+ echo 7
7
+ echo 8
8
+ echo 9
9
+ echo 10
10

That's fine so far. Now, if I run an external command instead of
printf (like /usr/bin/printf):

$ printf %s '/usr/bin/printf "\necho %s\n" {1..10} >> $0' > script.sh; bash -x 
./script.sh
+ /usr/bin/printf '\necho %s\n' 1 2 3 4 5 6 7 8 9 10
+ ho 6
./script.sh: line 2: ho: command not found
+ echo 7
7
+ echo 8
8
+ echo 9
9
+ echo 10
10

Running bash under strace, we see:

read(255, "/usr/bin/printf \"\\necho %s\\n\" {1"..., 43) = 43
read(255, "", 43)   = 0
brk(0x12c5000)  = 0x12c5000
write(2, "+ /usr/bin/printf '\\necho %s\\n' "..., 53+ /usr/bin/printf '\necho 
%s\n' 1 2 3 4 5 6 7 8 9 10
) = 53
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
lseek(255, 43, SEEK_CUR)= 86

Note that it's a SEEK_CUR, not SEEK_SET above, so we seek 43
bytes past the end of the file.

In gdb breaking on that lseek():

(gdb) bt
#0  lseek64 () at ../sysdeps/unix/syscall-template.S:84
#1  0x555c808f in sync_buffered_stream (bfd=) at 
input.c:554
#2  0x555aeff8 in make_child (command=command@entry=0x55925808 
"/usr/bin/printf \"\\necho %s\\n\" {1..10} >> $0", async_p=async_p@entry=0)
at jobs.c:1910
#3  0x55599fcd in execute_disk_command 
(words=words@entry=0x55928d88, redirects=0x55928b08,
command_line=command_line@entry=0x5592c5c8 "/usr/bin/printf \"\\necho 
%s\\n\" {1..10} >> $0", pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1,
async=async@entry=0, fds_to_close=0x55928aa8, cmdflags=0) at 
execute_cmd.c:5232
#4  0x5559ac52 in execute_simple_command 
(simple_command=0x55928a48, pipe_in=pipe_in@entry=-1, 
pipe_out=pipe_out@entry=-1, async=async@entry=0,
fds_to_close=fds_to_close@entry=0x55928aa8) at execute_cmd.c:4429
#5  0x5559bcea in execute_command_internal 
(command=command@entry=0x55928a08, asynchronous=asynchronous@entry=0, 
pipe_in=pipe_in@entry=-1,
pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0x55928aa8) 
at execute_cmd.c:806
#6  0x5559dfa2 in execute_command (command=0x55928a08) at 
execute_cmd.c:405
#7  0x55585e30 in reader_loop () at eval.c:180
#8  0x555848fa in main (argc=3, argv=0x7fffdb78, 
env=0x7fffdb98) at shell.c:792
(gdb) frame 1
#1  0x555c808f in sync_buffered_stream (bfd=) at 
input.c:554
554 lseek (bp->b_fd, -chars_left, SEEK_CUR);
(gdb) p *bp
$1 = {b_fd = 255, b_buffer = 0x55927f08 "", b_size = 43, b_used = 0, b_flag 
= 1, b_inputp = 43}

Here b_used < b_inputp which as far as I understand is not meant to happen.

That sync_buffered_stream is meant to seek back to where we're
meant to resume reading the script when we've read more than
needed, but here b_inputp > b_used would suggest we've processed
code that is passed what we've read. Or more likely b_used has
incorrectly been set to 0.

I stopped looking at that point.

-- 
Stephane



Re: [bug] [[ $'\Ux' = $'\Ux' ]] returns false for some values of x in some locales

2016-11-06 Thread Stephane Chazelas
2016-11-04 12:29:03 +, Stephane Chazelas:
[...]
> $ LC_ALL=zh_HK.big5hkscs locale charmap
> BIG5-HKSCS
> 
> Most of the problematic characters are the ones ending in 0x5c
> (which happens to be backslash in ASCII (or in BIG5-HKSCS when
> standing alone).
[...]

Those characters are also a problem for "read", "echo" and
probably many other cases:

$ echo $'\u3b1 b c' | bash -c 'read a b c; echo $b'
c
$ echo $'\u3b1 b c' | ksh93 -c 'read a b c; echo $b'
c
$ echo $'\u3b1 b c' | zsh -c 'read a b c; echo $b'
b
$ echo $'\u3b1 b c' | yash -c 'read a b c; echo $b'
b
$ locale charmap
BIG5-HKSCS

(ksh93 has a similar bug).

\u3b1 is the Greek lower case alpha character encoded as a3 5c
in that Hong Kong charset.

Also:

$ export alpha=$'\u3b1'
$ printf 'A%sB\n' "$alpha" | bash -c 'IFS=$alpha read a b c; echo $b'


(that one is OK in ksh93, zsh and bash).

$ bash -c 'echo -e "a${alpha}b"' | LC_ALL=C sed -n l
a\243\b$

(second byte of \u3b1 with "b" expanded to BS).

(same bug in zsh and ksh93, only yash OK).

(same with $'...' and printf)

-- 
Stephane




[bug] [[ $'\Ux' = $'\Ux' ]] returns false for some values of x in some locales

2016-11-04 Thread Stephane Chazelas
(reproduced with bash 4.3 or 4.4 on Debian unstable and Ubuntu 16.04).

perl -le "printf q([[ $'\U%X' = $'\U%X' ]] || echo %06X: $'\U%X').\"\n\",
  \$_,\$_,\$_,\$_ for (1..0xd7FF, 0xE000..0x10)" |
  LC_ALL=zh_HK.big5hkscs bash | LC_ALL=C sed -n l

Where the perl command outputs:

[[ $'\U1' = $'\U1' ]] || echo 01: $'\U1'
[[ $'\U2' = $'\U2' ]] || echo 02: $'\U2'
[[ $'\U3' = $'\U3' ]] || echo 03: $'\U3'
[[ $'\U4' = $'\U4' ]] || echo 04: $'\U4'


for all valid (albeit not necessarily assigned, let alone available in any
charset) Unicode codepoints.

Gives:

CA: $
CB: \\u00CB$
EA: $
EB: \\u00EB$
00011A: \210\\$
0003B1: \243\\$
000436: \310\\$
003075: \307\\$
003618: \234\\$
003661: \215\\$
0044C0: \226\\$
004A35: \232\\$
004AA4: \207\\$
004E48: \244\\$
004F62: \312\\$
004FDE: \253\\$
005045: \324\\$
00509C: \330\\$
00515D: \242\\$
00529F: \245\\$
005412: \246\\$
00542D: \247\\$
0056ED: \373\\$
00577C: \251\\$
0057A5: \316\\$
00587F: \341\\$
0058A6: \274\\$
0058F0: \211\\$
005A09: \256\\$
005A16: \321\\$
005A2B: \230\\$
005AF9: \345\\$
005B1E: \351\\$
005B40: \304\\$
005C10: \311\\$
005CA4: \314\\$
005D24: \261\\$
005E4B: \335\\$
005EC4: \264\\$
0060DD: \325\\$
006127: \267\\$
0063CA: \331\\$
0064FA: \302\\$
00669D: \272\\$
0067AF: \254\\$
0067E6: \317\\$
0069D9: \342\\$
006A9D: \375\\$
006B7F: \252\\$
006C7B: \313\\$
006C94: \250\\$
006D82: \322\\$
006DDA: \262\\$
006EDC: \336\\$
006F7F: \346\\$
007019: \362\\$
007035: \364\\$
00712E: \332\\$
0071E1: \355\\$
00727E: \326\\$
0072D6: \315\\$
007366: \352\\$
0073E2: \227\\$
0073EE: \257\\$
007435: \265\\$
00749E: \277\\$
0075B1: \236\\$
007667: \240\\$
007912: \360\\$
007A1E: \270\\$
007A40: \275\\$
007B0B: \216\\$
007BA4: \343\\$
007CED: \231\\$
007D85: \337\\$
007E37: \301\\$
007F61: \323\\$
0080D0: \320\\$
0080EC: \213\\$
00812A: \223\\$
0082D2: \255\\$
00833B: \333\\$
00838D: \327\\$
0084CB: \273\\$
00850C: \347\\$
00855A: \217\\$
00878F: \353\\$
0087B0: \356\\$
008A31: \263\\$
008C79: \260\\$
008D15: \367\\$
008D68: \340\\$
008DDA: \266\\$
008E0A: \344\\$
008E7E: \212\\$
008EA1: \306\\$
009103: \334\\$
009140: \363\\$
009145: \366\\$
009186: \350\\$
00923E: \271\\$
0093AA: \361\\$
0095B1: \276\\$
0097B8: \233\\$
009910: \300\\$
009924: \354\\$
0099F9: \357\\$
009A31: \365\\$
009ACF: \305\\$
009AE2: \221\\$
009AFF: \237\\$
009C4B: \370\\$
009C6D: \371\\$
009EE0: \303\\$
00FE4F: \241\\$
0205EB: \224\\$
020C3A: \376\\$
023600: \372\\$
0265AD: \225\\$
026C21: \222\\$
0270F8: \374\\$
02870F: \214\\$
02913C: \235\\$
02A014: \220\\$

$ LC_ALL=zh_HK.big5hkscs locale charmap
BIG5-HKSCS

Most of the problematic characters are the ones ending in 0x5c
(which happens to be backslash in ASCII (or in BIG5-HKSCS when
standing alone).

$ LC_ALL=zh_HK.big5hkscs bash -xc "[[ $'\u3b1' = $'\u3b1' ]]" 2>&1 | sed -n l
+ [[ \243\\ = \\\243\\ ]]$

Note that

bash -xc $'[[ \u3b1 = \u3b1 ]]'


also returns false in those locales.

There are similar problems for locales using BIG5, GB18030 or GBK charsets.

Same with "case" or

a=$'\u3b1'; [[ $a = $a ]]
or
[[ "$a" = "$a" ]]
or ${a#"$a"}

[ "$a" = "$a" ] is fine.

The CA and EA ones do look a lot like a bug in the glibc's
locale definition or gconv module (and the CB, EB ones are a
consequence of it)

$ LC_ALL=zh_HK.big5hkscs bash -xc "[[ $'\uca' = $'\uca' ]]" 2>&1 | sed -n l
+ [[ '' = \\\210f ]]$

A $'\uanything' following a $'\uca' always yields 0x88 0x66
(which happens to be the BIG5-HKSCS encoding of U+00CA) in
bash, zsh and ksh93 (though only for anything >= 0x80 in bash).

Those locales are problematic and should be avoided in general.
The problem  is that they are often *available*, so all those
corner cases caused by the fact that some characters contain
ASCII ones can be exploited (think of sudo or many sshd
deployments letting LC_* variables through for instance).

-- 
Stephane



Re: The "-e" test ingores broken links

2016-10-15 Thread Stephane Chazelas
2016-10-14 07:08:22 +0700, Peter & Kelly Passchier:
> WHich docs?
> If I do "help test" it states: "All file operators except -h and -L are
> acting on the target of a symbolic link, not on the symlink itself, if
> FILE is a symbolic link."
[...]

Yes, to test for file existence, the syntax is

[ -e "$file" ] || [ -L "$file" ]

Or:

ls -d -- "$file" > /dev/null 2>&1

But even then, if it returns false, that doesn't necessarily
mean the file doesn't exist. It could also be that it's
impossible to tell. If you remove the 2>&1 above, the error
message would help you differentiate between the cases.

If using zsh instead of bash, you can also check the $ERRNO
variable to see if [ -e ] failed because of ENOENT or something
else.

See also 
https://stackoverflow.com/questions/638975/how-do-i-tell-if-a-regular-file-does-not-exist-in-bash/40046642#40046642

-- 
Stephane




Re: Question about arithmetic expression grammar

2016-10-10 Thread Stephane Chazelas
2016-10-08 17:33:00 +0200, Conrad Hoffmann:
[...]
>   $ TEST=5; echo $((--TEST+++3)) # outputs 7
> 
> However, due to the documented operator precedence, I would have
> expected that expression to be equal to:
> 
>   $ TEST=5; echo $((--(TEST++)+3)) # outputs 8
> 
> Instead, though, it seems to be equal this one:
> 
>   $ TEST=5; echo $(((--TEST)+++3)) # outputs 7
> 
> So my qestions are:
> 
> Is this a bug? Or is this something that can't be resolved due
> ambiguities in the grammar? Or what's going on here at all?
[...]

--, ++ are optional in POSIX. That means you can't use those
operators in POSIX scripts and that if you need to combine two
unary - or + or a binary - with a unary -, you need to use
spaces or paren:

$((1--1)) # unspecified
$((--1)) # unspecified
$((--var)) # unspecified
$((1 - -1)) # OK
$((- -1)) # OK
$((1-(-1))) # OK
$((-(-1))) # OK

Now, if we look at the C spec, the way +++ is parsed is down to
tokenisation that will also go for the longest operator first.

There --test+++3 would be tokenised as -- test ++ + 3 which
would lead to a syntax error as test++ isn't an lvalue.

bash works differently.

>From what I understand from past discussions on the subject here
bash doesn't treat it as a syntax error and tries instead to tokenise
those incorrect ++/-- into multiple + or - operators if possible.

So here, --TEST+++3 is:

--TEST + +(+3)

And --(TEST++)+3

would be: -(-(TEST++))+3

-- 
Stephane



Re: Command substitution optimisation in dot scripts

2016-09-30 Thread Stephane Chazelas
2016-09-30 04:49:33 +0100, Martijn Dekker:
[...]
>   my_subshell_pid=$(sh -c 'echo $PPID')
> 
> This works fine on every shell, except on bash when a dot script is
> being executed.
[...]

While it does look like a bug, you could always do:

my_subshell_pid=$(exec sh -c 'echo $PPID')

To be sure you know where you stand. bash or other shells could have
issues if there's an EXIT trap in place that propagates to subshells or
any other reason that prevents them from optimising out the fork.

Note that it's the same in:

$ bash -c 'p=$(sh -c "echo \"\$PPID\""); echo "$p $BASHPID"'
1513 1512
$ bash -c 'p=$(exec sh -c "echo \"\$PPID\""); echo "$p $BASHPID"'
1515 1515

-- 
Stephane



[minor] [[ "\\" =~ [^]"."] ]] returns false

2016-09-16 Thread Stephane Chazelas
That's a special case of the

[[ "\\" =~ ["."] ]]

returning true (because bash called regcomp("[\\.]") instead of
regcomp("[.]") I had reported some time ago and was then fixed.

Here. It's similar:

$ bash -c '[[ "\\" =~ [^]"."] ]]' || echo unexpected
unexpected

$ ltrace -e regcomp bash -c '[[ "\\" =~ [^]"."] ]]'
bash->regcomp(0x7ffc146e78c0, "[^]\\.]", 1)  = 0

I suspect bash thinks the first "]" closes the bracket
expression and thus assumes the "." is outside them so needs to
be escaped for the RE engine as it's quoted in the bash RE.

-- 
Stephane



issues with SHLVL, subshells and implicit "exec"s

2016-09-03 Thread Stephane Chazelas
Hello,

that came up when discussing a related bug in zsh
(http://www.zsh.org/mla/workers/2016/msg01574.html)

$SHLVL is a feature introduced by tcsh (at least some patches on
tcsh) in the early eighties, that is meant to represent the
depth of a stack of shells executing each other.

It's not POSIX, but it's supported by tcsh, bash, zsh, ksh93
and busybox sh at least.

It's not incremented for subshells, only for shell invocations.

Where the behaviour differs is when using "exec".

All of tcsh, bash and zsh (but not ksh93 nor busybox sh)
decrement SHLVL when a "exec" would replace the shell process
with another command.

That makes sense because in

bash(1)
 |
 `- bash(2)
 |
 `- bash(3) -c 'exec bash(4)'

for instance, that bash(4) will end up being the child of
bash(2), just like bash(3). So bash(3) decrements SHLVL before
invoking bash(4), which will increment it again.

Now, and it's my first bug.

In:

$ echo "$SHLVL $$"
2 6192
$ bash -c bash
$ echo "$SHLVL $$ $PPID"
4 6193 6192

SHLVL has been increased by *2* even though that new bash shell
is a direct child of the previous one. That's because that "bash
-c bash" is actually a "bash -c 'exec bash'". That is bash
optimises out the fork for the execution of that one command.
While SHLVL is decremented for an explicit "exec", it is not for
an implicit one which IMO is a bug

Second bug:

$ SHLVL=1 bash -c 'echo "$SHLVL"; (exec bash); echo done'
2
$ echo "$SHLVL"
2
$ exit
done

There, bash has decremented $SHLVL upon "exec bash" even though
it was in a subshell. So the new bash shell ended up with the same
SHLVL as its parent.

tcsh has the same bug.

(tested with the head of the "devel" git branch)

-- 
Stephane



Re: redirection inside a process-substitution

2016-08-29 Thread Stephane Chazelas
2016-08-27 02:35:08 +0200, Helmut Karlowski:
[...]
> >I speculate that this has to do with something that zsh does to force
> >appending, whether that's lseek or something else, other than the fact
> >that zsh doesn't seem to use /dev/fd at all (I think it just straight
> >uses pipes).  Bash doesn't do anything special with tb.err after opening
> 
> zsh uses temp-files for all process-substitution, which limits it's
> features but is easier to do.
[...]

No, zsh only uses temp files for the =(...) form of process
substitution. For <(...) and >(...), it uses pipes and /dev/fd/x
or named pipes if the system doesn't support /dev/fd/n like for
bash. On cygwin:

$ zsh -c 'ls -ld <(:)'
prw--- 1 user None 0 Aug 29 13:52 /tmp/zshNyQjeh

It does use named pipes (not files), possibly that version of
zsh was built before /dev/fd/x support was added to cygwin. On
Linux:

$ zsh -c 'ls -ld <(:)'
lr-x-- 1 user group 64 Aug 29 13:54 /proc/self/fd/12 -> pipe:[81899]


For > >(file), it optimises the pipe()+open("/dev/fd/x") by
doing the pipe alone.

In anycase, whether it's unnamed or named pipes, it's functionaly
equivalent. The difference between bash and zsh I suppose here
is down to the order in which the files are open.

-- 
Stephane



Re: Intriguing error with arithmetic evaluation

2016-08-24 Thread Stephane Chazelas
2016-08-23 12:26:37 -0400, Greg Wooledge:
[...]
> > ksh -c '((0)); echo X'
> > 
> > outputs X
> 
> You forgot the -e.  Here's ksh88:
> 
> $ ksh -e -c '((0)); echo X'
> $ uname -a
> HP-UX imadev B.10.20 A 9000/785 2008897791 two-user license
[...]

Oops.

I did test with -e on Solaris though

$ Version M-11/16/88i
$ ksh -ec '((0)); echo X'
X
$ /usr/xpg4/bin/sh -ec '((0)); echo X'
X

(That sh being a slightly modified version of ksh88i to address
POSIX compliance IIRC).

Possibly HP-UX changed it? Sounds more likely than 
Solaris changing it the other way round.

What version of ksh is it based on?

-- 
Stephane




Re: Intriguing error with arithmetic evaluation

2016-08-23 Thread Stephane Chazelas
2016-08-12 14:22:32 -0400, Chet Ramey:
[...]
> The relevant change was probably the change in the set of commands to which
> `set -e' applies.  The (( command (among others) was added to that list
> in bash-4.1.  The change was the result of Posix changing the semantics
> of the errexit option and expanding its scope from simple commands to
> all commands.
> 
> The (( command returns 1 if the expression evaluates to 0.  When `level' is
> 0, level++ returns a 0 value, and (( returns a status of 1.
[...]

POSIX doesn't specify ((...)) (explicitely leaves it
unspecified), so is out of POSIX scope anyway.

It was introduced by ksh88.

There and in ksh93 (but not pdksh nor zsh)

ksh -c '((0)); echo X'

outputs X

For:

ksh -ec '[[ -z . ]]; echo X'

I see a difference between ksh88 (Solaris /usr/bin/ksh) which
displays the X (like bash<4.1) and ksh93 (u+) which doesn't any more.

In any case, I'd go with Greg's advice to avoid "set -e".

-- 
Stephane



printf %q doesn't quote blanks other than space and tab

2016-06-02 Thread Stephane Chazelas
bash treats all blank characters in the locale (except
multibyte ones, bug reported earlier) as token delimiters.

Yet printf %q only quotes space and tab, not the other ones.

For instance, on Solaris in locales using the iso8859-1
character set, 0xa0 (non-breaking space) is a single-byte blank
characters.

That has security implications because you often (generally?)
use printf %q to generate safe code to pass to eval. For instance:

$ out() { echo "Got $# arguments:"; printf '<%s>\n' "$@"; })
$ (set -x; printf -v code 'out %q' $'a\xa0b'; eval "$code")
+ printf -v code 'out %q' a�b
+ eval 'out a�b'
++ out a b
++ echo 'Got 2 arguments:'
Got 2 arguments:
++ printf '<%s>\n' a b



That "out" function was passed 2 arguments instead of the
expected 1.

The impact is limited to systems that have locales with
single-byte blank characters (beside Solaris, I don't know of
any other one).

(code above was tested with the ancient 3.2.51 on Solaris 10.
Given that "printf '%q\n' $'\u2006'" doesn't quote either on a
UTF-8 locale where that character is blank on Debian with
bash4.4beta, I assume it's the same in newer bash versions).

-- 
Stephane



Re: Avoid asterisk expansion when it selects "everything"

2016-04-14 Thread Stephane Chazelas
2016-04-13 11:23:01 +, Anis ELLEUCH:
> Hello everybody,
> 
> I would like to ask if it is possible to disable expanding asterisk when it
> selects all entries ?
> 
> `$ rm * .jpg` with a mistaken space between asterisk and .jpg will delete
> everything in your home directory or in the entire disk.
> 
> In my opinion, when the user asks to select "everything" which could be `*`
> or `path/*`, bash has to show a confirmation prompt to check if the user
> was not mistaken, this option should be obviously disabled by default
> 
> Another idea: `*` and `/*` should not be interpreted and the user has to
> enter another sequence "more powerful" to emphasize selecting all entries (
> `^*` would it work just fine ?)
[...]

zsh does that by default:

$ rm * .jpg
zsh: sure you want to delete all the files in /tmp [yn]?

(disabled with "setopt RM_STAR_SILENT")

Also in tcsh, though not enabled by default there:

> set rmstar
> rm *
Do you really want to delete all files? [n/y]

(they match on "rm *" or "rm dir/*")

For bash, you can try this approach:
https://unix.stackexchange.com/questions/108803/preventing-deletion-of-system-shell-aliased-folders/108854#108854

-- 
Stephane



Re: bash "while do echo" can't function correctly

2016-04-13 Thread Stephane Chazelas
2016-04-13 08:55:16 -0400, Greg Wooledge:
[...]
> > And if you want to keep eventual spurious characters after the
> > last NL character in the file:
> > 
> > while IFS= read -r line; do printf '%s\n' "$line"; done < test.txt
> > [ -z "$line" ] || printf %s "$line"
> 
> Another way to write that is:
> 
> while IFS= read -r line || [[ $line ]]; do ... done < test.txt
[...]

Except that it would add an extra newline character (which may
be desired as well).

-- 
Stephane



Re: bash "while do echo" can't function correctly

2016-04-13 Thread Stephane Chazelas
2016-04-13 08:10:15 +0200, Geir Hauge:
[...]
> while read -r line; do echo "$line"; done < test.txt
> 
> though printf should be preferred over echo:
> 
> while read -r line; do printf '%s\n' "$line"; done < test.txt
[...]

Actually, you also need to empty $IFS

while IFS= read -r line; do printf '%s\n' "$line"; done < test.txt

And if you want to keep eventual spurious characters after the
last NL character in the file:

while IFS= read -r line; do printf '%s\n' "$line"; done < test.txt
[ -z "$line" ] || printf %s "$line"

For details, see:

https://unix.stackexchange.com/questions/209123/understand-ifs-read-r-line
https://unix.stackexchange.com/questions/65803/why-is-printf-better-than-echo
https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice

-- 
Stephane



Re: mv to a non-existent path now renames instead of failing

2016-03-19 Thread Stephane Chazelas
2016-03-17 09:00:37 -0600, Eric Blake:
[...]
> That said, if you WANT an error if 'two/' does not exist, and to move
> 'one' to 'two/one' if 'two/' does exist, you can always use:
> 
> mv one two/.
> 
> where the trailing '.' changes the semantics required of the rename()
> call, and forces an error if 'two/' does not exist.
[...]

See also the GNU-specific

mv -t two one

To move one into two.

For the reverse: force a move-to as opposed to a move-into,
another GNU-specific option:

mv -T one two

if two is a directory, you'll get an error. If two is a symlink
(to directory or other), one is renamed to two (and the symlink
is gone).

FreeBSD mv has:

mv -h one two

To do a move-to instead of move-into when "two" is a symlink to
a directory.

-- 
Stephane




Re: why does time (cmd) 2> file redirect time's output?

2016-03-09 Thread Stephane Chazelas
2016-03-09 08:04:33 -0500, Chet Ramey:
> On 3/8/16 6:04 AM, Isabella Parakiss wrote:
> 
> > 
> > This seems to be a related problem:
> > $ time (exec true)  # doesn't print anything
> 
> Timing is an attribute associated with a command.  In this case, that's the
> simple command (`exec true') that is run in a subshell.  When that command
> is executed, the shell that is started to run the subshell and print the
> timing statistics is overwritten by the `exec true'.
[...]

Or in other words, 

time (cmd) [redirections]

is actually interpreted as if you had entered:

(time { cmd; }) [redirections]

As you've confirmed it was the intended behaviour (and anyway
some people rely on it), it would be worth documenting IMO.

-- 
Stephane



Re: GLOBIGNORE documentation

2016-03-07 Thread Stephane Chazelas
2016-03-07 08:58:05 +0100, Isabella Parakiss:
[...]
> OTOH this is arguably more useful than its ksh equivalent:
> GLOBIGNORE=-*; some-cmd *; some-cmd ./*
[...]

True, that's probably the one case where the GLOBIGNORE
behaviour is actually useful.

Note that with ksh93, you've got to write it:

FIGNORE='@(.|..|-*)'

That is, you need to exclude "." and ".." manually.

-- 
Stephane



GLOBIGNORE documentation

2016-03-06 Thread Stephane Chazelas
Today, I realised that GLOBIGNORE doesn't work at all like ksh's
FIGNORE.

With

GLOBIGNORE=x*

we're not filtering out files whose *name* starts with "x" from
globs but those whose *path* starts with "x".

In

echo *

files whose name starts with "x" will be excluded, but not in

echo ./*

I think the documentation should be clarified, because at the
moment it implies GLOBIGNORE applies to file names (like for
ksh's FIGNORE), not file paths.

Where it becomes borderline a bug is about "." and "..".

The doc says they're always excluded when GLOBIGNORE is
non-empty.

That's true for */* or .* for instance, but not for ./.* or .*/x
for instance.

$ bash -c 'GLOBIGNORE=x*; echo .*'
.*
$ bash -c 'GLOBIGNORE=x*; echo ./.*'
./. ./..
$ bash -c 'GLOBIGNORE=x*; echo .*/a'
./a ../a

To truely exclude . and .., one needs:

shopt -s extglob
GLOBIGNORE='?(*/)@(.|..)'

-- 
Stephane




Re: GLOBIGNORE documentation

2016-03-06 Thread Stephane Chazelas
2016-03-06 22:16:58 +, Stephane Chazelas:
[...]
> $ bash -c 'GLOBIGNORE=x*; echo .*'
> .*
> $ bash -c 'GLOBIGNORE=x*; echo ./.*'
> ./. ./..
> $ bash -c 'GLOBIGNORE=x*; echo .*/a'
> ./a ../a
> 
> To truely exclude . and .., one needs:
> 
> shopt -s extglob
> GLOBIGNORE='?(*/)@(.|..)'
[...]

That's not enough, that would fail to exclude . and .. in
.*/file (and also breaks */. globs)

GLOBIGNORE='?(*/)@(.|..)?(/*)' would break (common) ./* globs.

So, it looks like it's not possible to get the same behaviour as
in zsh/mksh/pdksh/fish/Forsyth shell (or ksh93 with
FIGNORE='@(.|..)') with GLOBIGNORE after all.

-- 
Stephane




Re: [minor] "precision" of $SECONDS

2016-02-25 Thread Stephane Chazelas
2016-02-25 10:48:51 -0500, Chet Ramey:
[...]
> Because bash doesn't have floating point arithmetic.

Yes, makes sense. mksh having $EPOCHREALTIME floating point even
though it doesn't have floating point arithmetic does sound
weird.

Any plan of adding floating point arithmetic support to bash by
the way?

> There's no
> real reason to have $SECONDS in a format you can't use to perform
> calculations.

That could be done with an extra $NANOSECONDS variable, but then
that wouldn't be reliable as in now=$SECONDS.$NANOSECONDS,
$SECONDS and $NANOSECONDS could be expanded at different seconds
(if run for instance at 00:00:00.999).

A printf '%(sec=%s nsec=%N)T' -1 wouldn't have the problem though.

> Bash's %T implementation doesn't have %N because it uses the libc
> strftime(3), and as far as I know, no strftime provides it.  I assume
> that ksh93 implements it internally as part of libast.
[...]

Probably. Note that GNU date also has a %N and doesn't use
strftime either. strftime taking a struct tm can't have
subseconds anyway.

-- 
Stephane



Re: [minor] "precision" of $SECONDS

2016-02-25 Thread Stephane Chazelas
2016-02-25 13:18:17 +, Stephane Chazelas:
[...]
> > function __age { declare ns=$(date +"%N"); declare -i
> > ms=${ns##+(0)}/100;
> >  printf "%4d.%03d\n" $SECONDS $ms
> > }
> [...]
> 
> I'm not sure how that gives you the time since startup.
> Currently, if bash is started at
> 
> 00:00:00.7
> 
> After 0.4 seconds (at 00:00:01.1), $SECONDS will be 1 (the "bug"
> I'm raising here). "ms" will be 100, so you'll print 1.100
> instead of 0.600. And with my suggested fix, you'd print 0.100.
 0.400

Sorry, meant 0.400 above.

-- 
Stephane




Re: [minor] "precision" of $SECONDS

2016-02-25 Thread Stephane Chazelas
2016-02-25 03:03:41 -0800, Linda Walsh:

> Stephane Chazelas wrote:
> >$ time bash -c 'while ((SECONDS < 1)); do :; done'
> >bash -c 'while ((SECONDS < 1)); do :; done'  0.39s user 0.00s system 99% cpu 
> >0.387 total
> >
> >That can take in between 0 and 1 seconds. Or in other words,
> >$SECONDS becomes 1 in between 0 and 1 second after the shell was
> >started.
> The format you are using to display output of 'time' doesn't show
> real time -- only CPU seconds.

It does. The last number (0.387 total) is the elapsed time in
the output of zsh (my interactive shell)'s "time" keyword. CPU
times are 0.39s user and 0.00s system totalling to 0.39s

Because it's a busy loop, CPU time is close to 100% (99%) so
elapsed and CPU time are roughly the same.

> Try:
> 
> TIMEFORMAT='%2Rsec %2Uusr %2Ssys (%P%% cpu)'

That would be for bash. In anycase, bash does already include
the elapsed time in its default time output like zsh.

But the problem here is not about the time keyword, but about the
$SECONDS variable.

[...]
>With linux, one can read /proc/uptime to 100th's of a sec, or
> use date to get more digits.  A middle of the road I used for
> trace timing was something like:
> 
> function __age { declare ns=$(date +"%N"); declare -i
> ms=${ns##+(0)}/100;
>  printf "%4d.%03d\n" $SECONDS $ms
> }
[...]

I'm not sure how that gives you the time since startup.
Currently, if bash is started at

00:00:00.7

After 0.4 seconds (at 00:00:01.1), $SECONDS will be 1 (the "bug"
I'm raising here). "ms" will be 100, so you'll print 1.100
instead of 0.600. And with my suggested fix, you'd print 0.100.

[...]
> As you can see, I wanted the times
> relative to the start of a given script, thus used SECONDS for that.

Note that all of zsh, ksh93 and mksh have builtin support to get
elapsed time information with subsecond granularity.

zsh has:
  - $SECONDS: time since shell start. floating point after
typeset -F SECONDS
  - $EPOCHSECONDS (unix time) (in zsh/datetime module)
  - $EPOCHREALTIME: same as floating point
  - zselect builtin to sleep with 1/100s granularity
(in zsh/zselect module)
  - the "time" keyword, without a command prints CPU and real
time for the shell and waited-for ancestors (and other
getrusage statistics you can add with TIMEFMT)

ksh93 has:
  - $SECONDS: time since shell start. floating point after
typeset -F SECONDS
  - EPOCHREALTIME=$(printf '%(%s.%N)T' now) for unix time as a
float (note that you need a locale where the decimal
separator is a period to be able to use that in ksh
arithmetic expressions, or you need to replace that "." with
a "," above).
  - builtin sleep with sub-second granularity

mksh has:
  - $EPOCHREALTIME: unix time as floating point (note however
  that mksh doesn't support floating point arithmetic).
  - builtin "sleep" command with sub-second granularity.

Similar features would be welcome in bash.

bash has "times" that gives you CPU time with sub-second
granularity. It's got a "printf %T" a la ksh93, but no %N, its
$SECOND is only integer (and currently has that issue discussed
here).

It does supports a $TMOUT with sub-second granularity though.
You can use that to sleep for sub-second durations if you find a
blocking file to read from. On Linux, that could be:

TMOUT=0.4 read < /dev/fd/1 | :

but that still means forking processes.

-- 
Stephane



[minor] "precision" of $SECONDS

2016-02-24 Thread Stephane Chazelas
$ time bash -c 'while ((SECONDS < 1)); do :; done'
bash -c 'while ((SECONDS < 1)); do :; done'  0.39s user 0.00s system 99% cpu 
0.387 total

That can take in between 0 and 1 seconds. Or in other words,
$SECONDS becomes 1 in between 0 and 1 second after the shell was
started.

The reason seems to be because the shell records the value
returned by time() upon start-up and $SECONDS expands to
time()-that_saved_time. So, if bash is started at 10:00:00.999,
then $SECONDS will become 1 only a milisecond after startup
while if it's started at 10:00:01.000, $SECONDS will become 1 a
full second later.

IMO, it would be better if gettimeofday() or equivalent was used
instead of time() so that $SECONDS be incremented exactly one
second after start-up like ksh93 does.

mksh and zsh behave like bash (I'll raise the issue there as
well).

With zsh (like in ksh93), one can do "typeset -F SECONDS" to
make $SECONDS floating point, which can be used as a work around
of the "issue".

-- 
Stephane




Re: Comma expression in arithmetic evaluation referring to arrays make bash crash.

2016-02-15 Thread Stephane Chazelas
[...]
> Reproduced with 4.2.53 on Debian:
[...]



Actually, it was already reported in early 2013 and fixed for
4.3:

http://thread.gmane.org/gmane.comp.shells.bash.bugs/19384

-- 
Stephane



Re: Comma expression in arithmetic evaluation referring to arrays make bash crash.

2016-02-15 Thread Stephane Chazelas
2016-02-15 09:31:57 -0500, Chet Ramey:
> On 2/15/16 8:57 AM, Pontus Stenström wrote:
> 
> > Bash Version: 4.2
> > Patch Level: 24
> > Release Status: release
> > 
> > Description:
> > Comma expression in arithmetic evaluation referring to arrays make bash
> > crash.
> > 
> > Repeat-By:
> > This works fine:
> > ((c=3, d=4))
> > This crashes my bash:
> > a=(2 3 4 5)# OK
> > ((c=a[3], d=a[2])) # Crash
> 
> It runs fine on bash-4.3.42 on RHEL 5 and Mac OS X.
[...]

Reproduced with 4.2.53 on Debian:

Starting program: bash4.2.53 -c a=\(1\ 2\ 3\ 4\ 5\ 6\)\;\ \(\(b=a\[3\],\ 
c=a\[4\]\)\)\;\ typeset\ -p\ b\ c

Program received signal SIGSEGV, Segmentation fault.
strlen () at ../sysdeps/x86_64/strlen.S:106
106 ../sysdeps/x86_64/strlen.S: No such file or directory.
(gdb) bt
#0  strlen () at ../sysdeps/x86_64/strlen.S:106
#1  0x0043c952 in expr_bind_array_element (tok=tok@entry=0x6f5328 "c", 
ind=ind@entry=3, rhs=rhs@entry=0x6f5318 "5") at expr.c:331
#2  0x0043e2c8 in expassign () at expr.c:531
#3  0x0043d532 in expcomma () at expr.c:441
#4  0x0043d736 in subexpr (expr=0x6fb7c8 "b=a[3], c=a[4]") at expr.c:419
#5  0x0043e5ca in evalexp (expr=0x6fb7c8 "b=a[3], c=a[4]", 
validp=0x7fffda90) at expr.c:384
#6  0x004321d8 in execute_arith_command (arith_command=, 
arith_command=) at execute_cmd.c:3309
#7  execute_command_internal (command=0x6fb508, asynchronous=0, 
pipe_in=7320904, pipe_out=0, fds_to_close=0x6fdc88) at execute_cmd.c:901
#8  0x00432859 in execute_connection (fds_to_close=, 
pipe_out=, pipe_in=, asynchronous=,
command=) at execute_cmd.c:2326
#9  execute_command_internal (command=0x6fb5c8, asynchronous=0, pipe_in=-1, 
pipe_out=-1, fds_to_close=0x6fb7a8) at execute_cmd.c:891
#10 0x00433fce in execute_command (command=0x6fb5c8) at 
execute_cmd.c:382
#11 0x0043281e in execute_connection (fds_to_close=, 
pipe_out=, pipe_in=, asynchronous=,
command=) at execute_cmd.c:2324
#12 execute_command_internal (command=0x6fb748, asynchronous=0, pipe_in=-1, 
pipe_out=-1, fds_to_close=0x6fb788) at execute_cmd.c:891
#13 0x00471024 in parse_and_execute (string=, 
from_file=from_file@entry=0x4a990d "-c", flags=flags@entry=4) at 
evalstring.c:340
#14 0x0041d9ba in run_one_command (command=) at 
shell.c:1315
#15 0x0041c786 in main (argc=3, argv=0x7fffdf78, 
env=0x7fffdf98) at shell.c:688

See how it calls expr_bind_array_element on "c" as if it wanted
to assign something to c[3] instead of c. The 3 looks like it
comes from the previous a[3] expansion.

-- 
Stephane



Re: capturing in ${VAR//(*(\/)(+([^\/]))?(\/)/_${BASH_REMATCH[1]}_}

2016-02-08 Thread Stephane Chazelas
2016-02-08 09:00:09 -0500, Chet Ramey:
> On 2/8/16 2:47 AM, Linda Walsh wrote:
> > When you are doing a var expansion using the
> > replacement format ${VAR//./.}, is there some way to
> > put parens around some part of the expression and reference
> > them as in the [[V~re]] RE-matches?
> 
> No.  Shell patterns do not have backreferences.
[...]

Note that the feature is available in other shells and quite
handy there. It could be worth adding to bash

$ zsh -o extendedglob -c 'a=1234; echo ${a//(#b)(?)(?)/${match[2]}${match[1]}}'
2143
(#b) to activate back-references stored in $match array.

$ zsh -o extendedglob -c 'a=1234; echo ${a//(#m)?/<$MATCH>}'
<1><2><3><4>
(#m) to record the matched portion in $MATCH.

Though I suspect for bash you would prefer the ksh93 syntax:

$ ksh93 -c 'a=1234; echo ${a//@(?)@(?)/\2\1}'
2143

-- 
Stephane



Re: bash can't distinguish between empty and unset arrays

2016-02-03 Thread Stephane Chazelas
2016-02-03 23:43:37 +, Martijn Dekker:
> bash treats an empty array as if it were an unset variable, which seems
> very illogical as empty is quite distinct from unset:
> 
> $ myarray=()
> $ [[ -v myarray ]] && echo set || echo unset
> unset

[[ -v var ]] is for scalar variables (AFAICT).
bash like ksh arrays to some extend overload. (as in $var is ${var[0]}).

$ bash -c 'a[1]=x; [[ -v a ]]' || echo unset
unset
$ bash -c 'a[1]=x; [[ -v a[@] ]]' && echo set
set
~$ bash -c 'a=(""); [[ -v a[@] ]]' && echo set
set

In:

$ bash -c 'a=(); [[ -v a[@] ]]' || echo unset
unset

It still returns "unset" as none of the elements are set.

You can always check whether the variable is "bound" with 
declare -p


declare -p array >/dev/null 2>&1 && echo set

I don't know how one can check whether the variable is
/declared/ or not though (when it's not "bound").


> $ set | grep ^myarray=# yet, it's set:
> myarray=()
> $ set -u
> $ for i in "${x[@]}"; do :; done
> bash: x[@]: unbound variable
> 
> Note also that the "unbound variable" error is inconsistent with the
> behaviour of "$@"; I would have thought that, logically, "$@" and
> "${x[@]}" should behave the same way, since arrays are implemented as a
> logical extension of the positional parameters concept.

Agreed.

> zsh and ksh93 can distinguish between empty and unset arrays.
[...]

In zsh, arrays are a clearly distinct type.

You may want to check the archieve, I beleive it's been
discussed before.

-- 
Stephane



why does time (cmd) 2> file redirect time's output?

2016-01-27 Thread Stephane Chazelas
Hello,

(tested with bash-4.3 and bash-2.05b)

"time" is meant to time a pipeline. That generally includes
redirections of the last command in the pipeline, but in the
case of a subshell alone, the redirection is not timed(*) and
seems to affect the output of "time" itself.

That's quite inconsistent, and different from other shells.

$ bash -c 'time (uname) 2> /dev/null'
Linux
$ bash -c 'time uname 2> /dev/null'
Linux

real0m0.002s
user0m0.000s
sys 0m0.001s
$ bash -c 'time { uname;} 2> /dev/null'
Linux

real0m0.001s
user0m0.000s
sys 0m0.001s
$ bash -c 'time (uname) | wc -c 2> /dev/null'
6

real0m0.002s
user0m0.000s
sys 0m0.003s
$ bash -c 'time (uname) | (wc -c) 2> /dev/null'
6

real0m0.003s
user0m0.000s
sys 0m0.003s
$ bash -c 'time (uname) 2> /dev/null | (wc -c) 2> /dev/null'
6

real0m0.002s
user0m0.001s
sys 0m0.002s
$ ksh -c 'time (uname) 2> /dev/null'
Linux

real0m0.00s
user0m0.00s
sys 0m0.00s
$ zsh -c 'time (uname) 2> /dev/null'
Linux
( uname; ) 2> /dev/null  0.00s user 0.00s system 89% cpu 0.002 total
$ mksh -c 'time (uname) 2> /dev/null'
Linux
0m0.00s real 0m0.00s user 0m0.00s system



Is there a reason for that?

Note that it may be too late to change it as some people seem to
rely on it. See for instance:

https://unix.stackexchange.com/questions/12068/how-to-measure-time-of-program-execution-and-store-that-inside-a-variable/12069#12069

But it could be worth documenting.

(*)

well, time (cmd) < <(busy-cmd)
does seem to time "busy-cmd".

-- 
Stephane



Re: Future date

2016-01-25 Thread Stephane Chazelas
2016-01-25 08:23:10 -0500, Greg Wooledge:
[...]
> Just to be clear, the -d 'human readable stuff' option is specific to GNU
> date, and won't work on other systems.  Also, the 'human readable stuff'
> part is NOT specified.  There is no documentation for what is allowed
> there, and what is not.  You have to figure it out by trial and error,
> and it may change between versions of GNU date.
[...]

That's a bit exaggerated. There's
https://www.gnu.org/software/coreutils/manual/html_node/Date-input-formats.html

At least it's more documented (and less buggy in my experience)
than what ksh93 accepts for its:

printf '%(%F %T %z)T\n' 'date specification'

-- 
Stephane



  1   2   3   4   5   >