Re: read command sometimes misses newline on timeout

2024-10-05 Thread Martin D Kealey
The read builtin could return an exit status of (128|SIGALRM) in two
circumstances:

1. If a signal is caught during the read syscall, then either the read
system call returns -1 with EINTR and the error is reported. (Otherwise it
must return a positive byte count, causing the built-in continues until it
gets a delimiter.)

2. If a signal is caught between read syscalls, it could (by a variety of
mechanisms) replace the exit status of the read built-in with a non zero
number. Presumably this is what you're seeing in fuzzing?

I will take a look at builtins/read.def when I get home. I suspect it's
looking at the "have I received SIGALRM" flag before looking at the "have I
read a delimiter" flag; I will report back on what I find.

On Fri, 4 Oct 2024, 22:18 Thomas Oettli via Bug reports for the GNU Bourne
Again SHell,  wrote:

> Configuration Information [Automatically generated, do not change]:
> Machine: x86_64
> OS: linux-gnu
> Compiler: x86_64-pc-linux-gnu-gcc
> Compilation CFLAGS: -O2 -pipe
> uname output: Linux testserver 6.6.47-gentoo #1 SMP Tue Aug 20 09:38:16
> CEST 2024 x86_64 Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz GenuineIntel
> GNU/Linux
> Machine Type: x86_64-pc-linux-gnu
>
> Bash Version: 5.2
> Patch Level: 26
> Release Status: release
>
> Description:
> I have tried to write a bash script that asynchronously reads from
> a pipe (line by line) with the help of "read -t".
> If the timeout occurs in just the right moment, read returns the full
> line, but the return code says timeout (rc > 128).
> Therefor it is not possible to know if a full line was returned or
> not. Please see the script in the Repeat-By section that reproduces the
> error in seconds.
>
>
> Repeat-By:
> function reader() {
>   local buf line
>   while :; do
> read -t .01 buf
> rc=$?
> if (( rc == 0 )); then
>   line+=$buf
> elif (( rc > 128 )); then
>   line+=$buf
>   continue
> fi
> [[ $line != TEST ]] && echo Invalid line: $line && exit
> echo OK
> line=""
>   done
> }
> reader < <(
>   while :; do
> echo -n TEST
> sleep .00$(($RANDOM%10))
> echo
>   done
> )
>
>


Re: fg via keybind modifies tty settings

2024-09-21 Thread Martin D Kealey
Does this happen with any raw-mode application, or just vim?

When using readline in Emacs mode, the terminal is necessarily in raw mode.

I suspect what you're seeing is that 'fg' bound to a key is bypasses the
normal "exit readline" that would restore the settings. Then when vim exits
or is suspended, Bash notices that it's still in raw mode (-icanon), but
doesn't otherwise know the details of exactly how you want it cooked
(+icanon but what else?).

I'll check on this when I get back to my PC.

-Martin

On Sat, 21 Sep 2024, 09:23 David Moberg,  wrote:

> Configuration Information [Automatically generated, do not change]:
> Machine: x86_64
> OS: linux-gnu
> Compiler: gcc
> Compilation CFLAGS: -g -O2 -fno-omit-frame-pointer
> -mno-omit-leaf-frame-pointer -flto=auto -ffat-lto-objects
> -fstack-protector-strong -fstack-clash-protection -Wformat
> -Werror=format-security -fcf-protection -Wall
> uname output: Linux Tugge 6.8.0-45-generic #45-Ubuntu SMP PREEMPT_DYNAMIC
> Fri Aug 30 12:02:04 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
> Machine Type: x86_64-pc-linux-gnu
>
> Bash Version: 5.2
> Patch Level: 21
> Release Status: release
>
> Description:
> When a process/job is suspended, foregrounded via ctrl-a as a
> keybinding for fg, and then
> suspended again, the tty will be in a surprising state where no
> input is seen (-echo)
>
> Repeat-By:
> |   bind -x '"\C-a":"fg"'   # bind C-A to perform fg
> vim #
> # Terminal still works
> stty -a > working.txt   # store tty settings
> # Put vim in foreground again
> :q  # Quit vim
> vim # Start a new vim session (*)
> # Terminal is broken
> stty -a > broken.txt# store tty settings
> #type text and no echo
>
> Dumping the tty configuration before and after the broken state
> reveals this diff after the
> second vim command above (*) has been executed:
>
> ```console
> $ diff working.txt broken.txt
> 4c4
> < werase = ^W; lnext = ^V; discard = ^O; min = 1; time = 0;
> ---
> > werase = ^W; lnext = ; discard = ; min = 1; time =
> 0;
> 6c6
> < -ignbrk brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl
> ixon -ixoff
> ---
> > -ignbrk brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr
> -icrnl ixon -ixoff
> 9c9
> < isig icanon iexten echo echoe echok -echonl -noflsh -xcase
> -tostop -echoprt
> ---
> > isig -icanon iexten -echo echoe echok -echonl -noflsh -xcase
> -tostop -echoprt
> ```
>
> The tty can get back into working state by running `stty sane`.
>
> By redoing the 
> it can be seen that the
> terminal will only be broken after every second invocation.
>
> Similar issue(?) in fish that has been resolved:
> https://github.com/fish-shell/fish-shell/issues/2114
>
> I have experienced the same weirdness with other applications than
> vim, for example python3.
> I have experienced the issue several terminals: Wezterm, xfce-4
> terminal.
>


Re: [feature request] Add ".sh" or ".bash" extension to tmpfile generated by `fc`

2024-09-20 Thread Martin D Kealey
In 2024 an editor having such a simplistic approach counts as a bug.

But perhaps adding a variable would allow anyone to nominate their own
favourite, such as as BASHFC_TMPNAM=/tmp/bash-fc.$$.XX.sh

Alternatively, perhaps an extra line could be inserted at the start of the
b file, like « #!fc-edit/bash »

On Fri, 20 Sep 2024, 16:41 shynur .,  wrote:

> `fc` will create a temporary file named something
> like "bash-fc.Esf9by", which seldom benefits from
> editors that use *suffixes* to infer what syntax
> highlighting should be enabled.
>
> (This improvement may not only apply to `fc`.)
>
> --
> shynur
>


Re: bash builtins mapfile issue - Unexpected parameter passing of causes rce

2024-09-14 Thread Martin D Kealey
You seem to be implying that execstr contains a value that's under the
control of the input stream in a way that would allow malicious data on the
input stream to cause the shell to invoke arbitrary code.

I read the run_callback() function, and I don't see that as plausible,
unless you claiming there's a bug in sh_single_quote().

If that's the case, please be clear, and please provide a transcript of a
session where sh_single_quote() returns an improperly protected string,
either using mapfile or otherwise.

Otherwise it is not appropriate to issue a CVE where no vulnerability
exists.

To be clear, it is not the shell's job to stop scripts from intentionally
doing stupid things. Your "whoami" example just proves that the shell is
working exactly as it should.

If you have a case where a script provides an uncontrolled value as the
argument to -C, that warrants a CVE be issued against the script, not
against Bash.

-Martin

On Sat, 14 Sep 2024, 21:46 ~ via Bug reports for the GNU Bourne Again
SHell,  wrote:

> Dear bug-bash team:
>   I hope this email finds you well. During my recent security
> assessment of bash, I identified a potential security vulnerability that I
> believe may impact the security of your product and its users.
> here is details:
> 1、mapfile -C xxx will call run_callback
> 2、evil "execstr" parameter  passing causes rce
> mapfile.def
>
> for example in bash shell:
> echo -e
> "line1\nline2\nline3\nline4\nline5\nline6\nline7\nline8\nline9\nline10"
> > test.txt
> mapfile -t -C "whoami #111" -c 5 my_array < test.txt 
>
>
>
> I want to assign a CVE ID to the vulnerability
>
>
> I look forward to working with you to address this matter promptly and
> securely.  Please feel free to contact me directly if you have any
> questions or need further information.
>
>
> Thank you for your attention to this matter.


"make depend(s)" broken

2024-09-10 Thread Martin D Kealey
As part of merging "shopt" and "set -o", I've had to update quite a lot of
files, including adding and removing #includes.
So I thought I should run "make depends" to fix up the Makefile.

Problem 1: the generated .depends file doesn't seem to be connected to the
Makefile.

Oh well, I'll just take the .depends file and use it to update Makefile.in
... errm ... well, possible, but hardly "simple".

Is there some recommended way of doing this that doesn't involve lots of
hand-editing?

Problem 2: it appears that “make depends” and “make depend” have been
broken since:

> commit 6078dd9a9708077bb32d7027b3699a4bcc3d0a93
> Author: Chet Ramey 
> Date:   2018-04-20 11:38:52 -0400

which replaced support/mkdep with a version that lacked the “-c compiler”
option and the “--” end-of-options marker, both of which are required for
the “depends” make target:

> $(Program) $(SUPPORT_SRC)mkdep* -c ${CC} -- *${CCFLAGS} ${CSOURCES}


I note that the “depend” and “depends” targets have been present since:

> commit bb70624e964126b7ac4ff085ba163a9c35ffa18f (tag: bash-2.04-tarball)
> Author: Jari Aalto 
> Date:   2000-03-17 21:46:59 +
>
(and that was just to fix a typo from when it was created in 1997)

-Martin

PS: It seems like that Makefile line should use “sh” rather than
“$(Program)”, otherwise it would be impossible to go “make depend” before
the initial build.

PPS: "mkdep" seems like a deficient exemplar of a robust shell script; lack
of quoting; broken and missing error checking; error messages to stdout
instead of stderr.


autoconf can't cope with picky compiler, typo in shmbutil.h

2024-09-10 Thread Martin D Kealey
I have this wrapper in  ~/sbin/gcc:

> #!/bin/sh
> exec /usr/bin/gcc -Werror -pedantic "$@"

so that I can fix every possible complaint about the code I'm writing.

Unfortunately, when I go “./configure --prefix=/some/where”, I get lots of
false negatives when probing for built-in functions, such as:

> checking for *isblank*... *no*


If I look in config.log I find:

> configure:15821: gcc -o conftest -g -O2   conftest.c  >&5
> conftest.c:261:6: error: conflicting types for built-in function
> 'isblank'; expected 'int(int)' [-Werror=builtin-declaration-mismatch]
>   261 | char isblank (void);
>   |  ^~~
> conftest.c:253:1: note: 'isblank' is declared in header ''
>

which raises the question of *why* there's a conflicting declaration, and
*why* it's so weird, when POSIX 2023 n3054 simply says:

> *7.4.1.6* The isgraph function
> Synopsis
> #include 
> int isgraph(int c);


I guess the short answer is "well don't do that", but it does make autoconf
feel like it's lost touch with modern reality.

Apart from that, this managed to uncover an *actual* bug:

diff --git a/include/shmbutil.h b/include/shmbutil.h
index a8a59bf1a..1feee8535 100644
--- a/include/shmbutil.h
+++ b/include/shmbutil.h
@@ -86,7 +86,7 @@ extern int locale_utf8locale; /* XXX */
 #define UTF8_SINGLEBYTE(c) (1)
 #define UTF8_MBFIRSTCHAR(c)(0)

-#define*d* VALID_SINGLEBYTE_CHAR(c)  (1)
+#define VALID_SINGLEBYTE_CHAR(c)  (1)

 #endif /* !HANDLE_MULTIBYTE */

-Martin


Re: Feature request: process title with exec -a "" for oneself

2024-09-02 Thread Martin D Kealey
On Sun, 1 Sept 2024 at 12:43, Lockywolf <
for_bug-bash_gnu.org_2024-09...@lockywolf.net> wrote:

> Dear Bash developers,
>
> May I ask for a small feature to be added to bash?
>
> At the moment exec changes IO redirections for the newly started
> processes, but if there is no command for exec'ing, it changes those
> redirections for the bash process itself.
>
> The exec -a, however, allows setting the current process title (argv0)
> only for the newly exec'ed processes, and "exec -a whatever" (without a
> command) does nothing.
>

Whilst I hesitate to justify choices on the basis of "consistency", i would
argue that doing *nothing* is more consistent with what happens when exec
is used to invoke a *script*, because in that case the execve kernel call
will discard any supplied argv[0] and substitute either the given command
pathname or (to safely support setuid scripts on some non-Linux systems) a
magic path to a pre-opened filedescriptor.

That said, there's virtually no downside to this, and it *might* be useful
in some corner cases, as long as the script can tolerate the "do nothing"
fallback on systems where this is less feasible to implement.

-Martin


Re: [PATCH 1/2] printf: fix heap buffer overflow in printf_builtin

2024-08-30 Thread Martin D Kealey
Hi Andrei

Ok, I see the problem.

This fault is triggered when the format string has '%(' but is missing the
closing ')' - so the entire remainder of the format string is tentatively
recorded as the time-format substring.

This line:

   if (*++fmt != 'T')

should be changed to:

   if (n > 0 || *++fmt != 'T')

or perhaps:

   if (*fmt == 0 || *++fmt != 'T')

(Personally I would prefer the former, since it would still reject
unbalanced parentheses even if some later code change avoids overrunning
the end-of-string.)

I note that the suggested patch amounts to (a slow version of):

   if (*fmt != 0 && *++fmt != 'T')

which avoids the overrun but fails to report the error to the user.

-Martin

On Fri, 30 Aug 2024 at 22:28, Андрей Ковалёв 
wrote:

> Hi there!
>
> I completely understand your point of view. Although I made a few
> mistakes when writing the patch, I wrote patch for a reason. I was doing
> fuzzing testing in bash4, and at some point during fuzzing, ASAN
> (AddressSanitizer) was launched. This problem also existed in the master
> branch, so I wrote a patch to fix it.
>
> Here is the ASAN trigger on the input data that I attached to this email:
>
> ==2==ERROR: AddressSanitizer: heap-buffer-overflow on address
> 0x508009f8 at pc 0x55b1ce740ee0 bp 0x7fff5353bf90 sp 0x7fff5353bf88
>
> READ of size 1 at 0x508009f8 thread T0
>
>  #0 0x55b1ce740edf in printf_builtin
>
> /artifacts/build-aflplusplus/bash-5.2.26/build-bash/builtins/../../builtins/../../builtins/printf.def:492:7
>
>  #1 0x55b1ce464738 in execute_builtin
>
> /artifacts/build-aflplusplus/bash-5.2.26/build-bash/../execute_cmd.c:4974:13
>
>  #2 0x55b1ce4631ab in execute_builtin_or_function
>
> /artifacts/build-aflplusplus/bash-5.2.26/build-bash/../execute_cmd.c:5488:14
>
>  #3 0x55b1ce43c098 in execute_simple_command
>
> /artifacts/build-aflplusplus/bash-5.2.26/build-bash/../execute_cmd.c:4740:13
>
>  #4 0x55b1ce430f33 in execute_command_internal
> /artifacts/build-aflplusplus/bash-5.2.26/build-bash/../execute_cmd.c:866:4
>
>  #5 0x55b1ce42ddb0 in execute_command
> /artifacts/build-aflplusplus/bash-5.2.26/build-bash/../execute_cmd.c:413:12
>
>  #6 0x55b1ce3ab36a in reader_loop
> /artifacts/build-aflplusplus/bash-5.2.26/build-bash/../eval.c:171:8
>
>  #7 0x55b1ce3a07aa in main
> /artifacts/build-aflplusplus/bash-5.2.26/build-bash/../shell.c:833:3
>
>  #8 0x7f0e8e7bdc8b (/lib64/libc.so.6+0x27c8b) (BuildId:
> 97aecaf3aeb712a8e66d84b5319d6cca2cf5528e)
>
>  #9 0x7f0e8e7bdd44 in __libc_start_main (/lib64/libc.so.6+0x27d44)
> (BuildId: 97aecaf3aeb712a8e66d84b5319d6cca2cf5528e)
>
>  #10 0x55b1ce2c6ef0 in _start
> (/artifacts/build-aflplusplus/bash-5.2.26/build-bash/bash+0x21cef0)
> (BuildId: be8de6b123ba7c6e8bc2e7fbc1afe38d8c8a487b)
>
> 0x508009f8 is located 0 bytes after 88-byte region
> [0x508009a0,0x508009f8)
>
> allocated by thread T0 here:
>
>  #0 0x55b1ce36112f in malloc
> /usr/src/RPM/BUILD/llvm-project-18/compiler-rt/lib/asan/asan_malloc_linux.cpp:68:3
>
>
>
>  #1 0x55b1ce6a82fc in xmalloc
> /artifacts/build-aflplusplus/bash-5.2.26/build-bash/../xmalloc.c:114:10
>
>  #2 0x55b1ce5426a7 in dequote_string
> /artifacts/build-aflplusplus/bash-5.2.26/build-bash/../subst.c:4891:24
>
>  #3 0x55b1ce5a2cbb in glob_expand_word_list
> /artifacts/build-aflplusplus/bash-5.2.26/build-bash/../subst.c:12390:18
>
>  #4 0x55b1ce55057d in expand_word_list_internal
> /artifacts/build-aflplusplus/bash-5.2.26/build-bash/../subst.c:13012:13
>
>  #5 0x55b1ce550351 in expand_words
> /artifacts/build-aflplusplus/bash-5.2.26/build-bash/../subst.c:12284:11
>
>  #6 0x55b1ce439921 in execute_simple_command
> /artifacts/build-aflplusplus/bash-5.2.26/build-bash/../execute_cmd.c:4509:15
>
>
>
>  #7 0x55b1ce430f33 in execute_command_internal
> /artifacts/build-aflplusplus/bash-5.2.26/build-bash/../execute_cmd.c:866:4
>
>  #8 0x55b1ce42ddb0 in execute_command
> /artifacts/build-aflplusplus/bash-5.2.26/build-bash/../execute_cmd.c:413:12
>
>  #9 0x55b1ce3ab36a in reader_loop
> /artifacts/build-aflplusplus/bash-5.2.26/build-bash/../eval.c:171:8
>
>  #10 0x55b1ce3a07aa in main
> /artifacts/build-aflplusplus/bash-5.2.26/build-bash/../shell.c:833:3
>
>  #11 0x7f0e8e7bdc8b (/lib64/libc.so.6+0x27c8b) (BuildId:
> 97aecaf3aeb712a8e66d84b5319d6cca2cf5528e)
>
> SUMMARY: AddressSanitizer: heap-buffer-overflow
> /artifacts/build-aflplusplus/bash-5.2.26/build-bash/builtins/../../builtins/../../builtins/printf.def:492:7
>
> in printf_builtin
>
> Shadow bytes around the buggy address:
>
>  0x50800700: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fa
>
>  0x50800780: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fa
>
>  0x50800800: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fa
>
>  0x50800880: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fa
>
>  0x50800900: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
>
> =>0x50800980: fa fa fa fa 00 00 0

Re: bash passes changed termios to backgrounded process(es) groups?

2024-08-29 Thread Martin D Kealey
On Fri, 30 Aug 2024 at 04:17, Robert Elz  wrote:

> SIGTTOU is also sent, unconditionally, by any attempt to change any of
> the terminal's attributes, and the process (group) (by default) stops.
> (I don't recall off hand whether simply fetching the attributes is
> enough to generate SIGTTOU.)   Just as there's no stty option to avoid
> SIGTTIN (reading from the terminal) there's no option to avoid this
> kind of SIGTTOU - or there shouldn't be.
>

I've encountered something related to this, where the shell takes charge of
the terminal, away from another process that is using it.

This happens when trying to debug a modified version of Bash, with "gdb
./bash" then "run".
Gdb then stops twice before Bash prints its prompt, even though Bash
doesn't (seem to) print or read anything.

I assume that when gdb stops, the bash process underneath it will stall
when it hits something that gdb needs to be told about; or if not then,
when the outer shell regains control and resets the tpgrp to itself, the
inner shell would get SIGTTOU or SIGTTIN.

I was left wondering "why just twice", rather than once, not at all, or
repeating indefinitely; being triggered by tcsetattr could account for that.

  | Sure. But if you are restarted (and get your SIGCONT) due to the
> equivalent
>   | of a `bg', you still have to check whether you're in the foreground.
>
> Well, kind of, the more common approach, by most applictions, is to not
> bother to test, never ignore SIGTTIN/SIGTTOU, and simply go ahead and do
> whatever is needed,
> if the process stops because of one of those, and then is resumed as a
> background job, it will simply stop again when whatever made it stop is
> repeated.
> When it is resumed in foreground, it can do whatever is needed, and then
> (perhaps) be moved back to background later.
>

That's definitely where I was trying to go with my initial response, but
you've explained it better.

-Martin


Patch to unify shopt & set-o

2024-08-28 Thread Martin D Kealey
Hi Chet

On Wed, 28 Aug 2024 at 23:58, Chet Ramey  wrote:

> On 8/24/24 1:46 PM, Martin D Kealey wrote:
> > I've been making some tentative patches to the `devel` branch, and since
> I
> > have a fairly large bashrc, when I compile Bash with maximal debugging
> > support, its startup is ... underwhelmingly slothful.
>
> You're seeing the memory tracing and debugging code.
>

Thanks for that.

The patch I was referring to is at
https://github.com/kurahaupo/bash/tree/patch_options and it's almost ready
to go; time to let some other eyeballs take a look at it.

The purpose of my patch is to:
1. unify the handling of set -X, set -o XXX, and shopt -s XXX, so that
either command can manipulate all options, and there's a single module
underpinning both;
2. provide a pluggable framework, so that loadable modules can register new
shopt/set -o tags (and de-register them before unloading the module);
3. gather all the logic for each option in one place, without forcing all
options to be in one file.

The code makes extensive use of designated initializers, so after merging,
it would make C11 a requirement for building Bash. Please let me know if
that's likely to be an issue.

It has involved rewriting substantial portions of of flags.[ch],
builtins/set.def, and builtins/shopt.def (around 60% of each file), and has
created options.[ch] and examples/loadables/is_prime.c; the total lines
changed is $( git diff -w -U0 devel..@ | wc -l ) == 5183 so I'm open to
making stylistic and other adjustments if that would make it easier to
merge.

It passes all the manual tests I've thrown at it, but "make test" is still
rather noisy so I have a way to go yet.

In addition to the original goals, it now includes:
* an example loadable that defines several options.
* an additional help mechanism, with new text, so that "shopt --long-help
OPTNAME" is more informative.
* a much-simplified implementation of the "compat" options, since all
options can now have computed values rather than *having* to set individual
flag variables.
* a clean up of anomalous whitespace (mostly because I kept tripping over
it every time I tried to commit with my default "paranoid" checking
enabled).
* re-indenting a few small patches, to match what appears to be your
preferred style.
* some small tweaks to quell compiler warnings.
* adjustments to xmalloc.[ch] to enable easier handling of "const"
pointers; among other things, xxfree is a clone of xfree, except that it
takes a const void* parameter instead of just void*.

I've tried to sequence the commits so that they tell a clean narrative; all
the whitespace changes are first, one commit per file, so that you can go
"git diff -w devel..ws" to reassure yourself that no other changes have
snuck in. Then the creation of the new "options" module; then integrating
it with others; then removing old stuff that's no longer required; and
finally moving the option definitions to be adjacent to the code whose
behaviour is adjusted by each option.

The last dozen or so commits are somewhat experimental, as I've adjusted
the documentation framework somewhat, so I will probably continue to clean
up and occasionally push-rebase my repo on github; please let me know when
you'd like me to *stop* doing so, so that you can grab a branch for merging.

I hope that helps your evaluation.
Are there any other administrative points I need to address?

-Martin


Re: bash passes changed termios to backgrounded process(es) groups?

2024-08-28 Thread Martin D Kealey
On Thu, 29 Aug 2024 at 06:12, Steffen Nurpmeso  wrote:

> Chet Ramey wrote in
>  <3ca901aa-5c5e-4be3-9a71-157d7101f...@case.edu>:
>  |On 8/27/24 7:46 PM, Steffen Nurpmeso wrote:
>  |> ..and it seems that if bash starts a normal process then ICRNL is
>  |> set, but if it starts a (process)& or only process&, then not!
>  |> (I was about to send this to bug-readline first.)
>  |
>  |Under no circumstances should a background process attempt to fetch or
>  |modify terminal attributes. Why isn't your Mail process checking for
> that?
>
> How could it do so?
> (getpid()==tcgetpgrp() or what the function name is is the only
> idea i have, but note it is false for (EXE), too.  *Big problem*!)
>

You'd want getpgid() or getpgrp(), rather than getpid(). (On Linux,
getpgrp() returns the same as getpid() to the process group leader, but
that's *not* true on *BSD & Darwin.)

Having said that, it's more likely that Mail is actively doing something it
shouldn't be doing, and stopping doing it will suffice to fix the issue.

My first guess would be blocking or ignoring SIGTTIN and/or SIGTTOU.

The default behaviour is for a background process to receive SIGTTIN or
SIGTTOU when it attempts to interact with its controlling terminal. Indeed,
not just the process itself, but every other process in the same process
group too.

And the default action in response to those signals is to stop, the same as
SIGSTOP. SIGTTIN is always sent if you try to read from the tty, but
SIGTTOU is only send after "stty tostop" or equivalent.

So the main thing to do is to *avoid* ignoring or blocking those signals,
and to remove "stty -tostop" from your ~/.profile (or to add "stty tostop"
and complain to your OS vendor about their stupid default).

My memory is a little hazy on what happens if you attempt tcgetattr() or
tcsetattr() without otherwise reading or writing; in that case I suspect it
doesn't send either signal right away; so perhaps then the fix is simply to
try writing a welcome banner (or even just a single NL char) before
attempting tcgetattr(). (Maybe a zero-sized read or write might suffice?)

-Martin


Surprising results when profiling Bash

2024-08-24 Thread Martin D Kealey
I've been making some tentative patches to the `devel` branch, and since I
have a fairly large bashrc, when I compile Bash with maximal debugging
support, its startup is ... underwhelmingly slothful.

So I decided to build it with profiling enabled, and see if I'd done
something to ruin its performance. (Short answer: nope.)

What stood out immediately is that 50%~90% of the time is spent in
mregister_free(). In theory gprof separates the time spent in subordinate
function calls, but there's no reporting of find_entry(), perhaps because
it's 'static', and therefore in-lined, so perhaps that's the real culprit.

What puzzles me is that this is much more than mregister_alloc(), during a
phase when *most* of the activity is defining new stuff rather than getting
rid of stuff.

I haven't tweaked anything in this area of the code.

Is this expected behaviour?
Do I need to change my compilation options, or make any other changes?

I haven't delved very deeply into this code, but it does seem to be
preoccupied with managing signals, presumably because the code isn't
re-entrant; so I'm wondering if it would be worthwhile to investigate
different kinds of allocators, or perhaps a different approach to handling
signals?

-Martin


Re: Bash History Behavior Suggestion

2024-08-20 Thread Martin D Kealey
The problem with the tagged format is that it's *not* usable by grep, so
you're limited to exactly whatever magic is built into the "history"
command.

"Yuck" is in the eye of the beholder. I've tried numerous other ways to
segregate sessions, and IMO multiple files was the "least yuck" of many
worse options.

All that said, I would like *some* additional information recorded in the
history file, especially $PWD (when it changes - interpreting "popd"
requires significant mental effort when reading a history file), and
$BASHPID (to track nested shells). With those I would be happy to have
~/.bash_history.d/$TTY, which would greatly reduce the number of files.

Would that be few enough files to satisfy you?

-Martin

On Wed, 21 Aug 2024, 00:38 ,  wrote:

> Bash or no bash, spreading history over dozens of files in
> `bash_history.d/` is yuck.  We already have a comment with the timestamp
> in `.bash_history`. If I were implementing the suggestion, I would add
> more information to the comment, then add two new flags to the `history`
> command that filter+output the file (i.e. not the internal history
> list): `--global` to display everything in `.bash_history`, and
> `--local` to restrict output to entries from the current session.
> Everything else would remain as-is.
>
> So this:
>
>  #1724136363
>  man bash
>
> Becomes this:
>
>  #1724136363 [sess create time] [sess PID] [sess TTY]
>  man bash
>
> I think it is important to add the local/global flags because it gives
> us some leeway as to how that comment is structured.  If you take the
> line of "that's what grep is for", then we're committed to the v1 format
> forever after.
>
> The problem with the stackoverflow solutions is that they are
> all-or-nothing: either mash the history together across all sessions,
> but get strange behavior on history nav & expansion, or don't mash, be
> cut-off from information in concurrent sessions, and end up with the
> occasional unsaved session.  Being able to filter the file directly lets
> us look things up without having to slice-and-splice into the internal
> history.
>
> On 2024-08-20 6:14 am, Martin D Kealey wrote:
> > "Missing/disappearing history" is entirely down to the lack of "writing
> > history as you go", and yes that would be reasonable to offer as a new
> > opt-in feature.
> >
> > As for separation of sessions, I strongly suspect that anything between
> > *total* separation and *none* will result in so many ugly compromises
> > that
> > in the end almost nobody will be happy with it. So if there's to be an
> > additional option - which I'm not convinced of - I suggest that it
> > simply
> > be to set HISTFILE by default to either
> > $HOME/.bash_history.d/{some-pattern-here} (if the directory exists) or
> > ~/.bash_history (matching the current behaviour when that directory
> > does
> > not exist). I would recommend that the pattern include most or all of
> > $$,
> > $TTY, $LOGNAME, and $((EPOCHSECONDS-SECONDS)).
> >
> > Lastly, an awful lot of "default behaviour" is down to whatever
> > /etc/skel/.bashrc and /etc/bash/bashrc that are shipped with Bash by
> > the
> > various distros. Maybe Bash should start shipping some kind of
> > "standard
> > library" of functions that are *expected* to be included with any
> > distro,
> > but are not actually built into the binary.
> >
> > -Martin
> >
> > PS: complaining about "inelegant" in relation to Bash seems a bit
> > pointless.
> >
> > On Tue, 20 Aug 2024 at 16:48,  wrote:
> >
> >> I wouldn't consider dozens of stackoverflow/askubuntu/etc complaints
> >> of
> >> missing/disappearing history "cherry-picked".  There were far more
> >> than
> >> I sent.
> >>
> >> I understand not wanting to pull the rug out from under people, but
> >> the
> >> kludges Kealey posted were inelegant.  An opt-in for the suggested
> >> behavior would be good enough.
> >>
> >> JS
> >>
> >> On 2024-08-20 2:17 am, Lawrence Velázquez wrote:
> >> > On Tue, Aug 20, 2024, at 1:42 AM, supp...@eggplantsd.com wrote:
> >> >> The suggestion is that the default behavior needs some work
> >> >
> >> > The default behavior is unlikely to change.  For every cherry-picked
> >> > example of someone unsatisfied with it (bugs aside), there is likely
> >> > someone else who prefers it as is (or at least would not appreciate
> >> > it changing out from under them).  New shopt settings may be doable.
> >>
> >>
>


Re: Bash History Behavior Suggestion

2024-08-20 Thread Martin D Kealey
"Missing/disappearing history" is entirely down to the lack of "writing
history as you go", and yes that would be reasonable to offer as a new
opt-in feature.

As for separation of sessions, I strongly suspect that anything between
*total* separation and *none* will result in so many ugly compromises that
in the end almost nobody will be happy with it. So if there's to be an
additional option - which I'm not convinced of - I suggest that it simply
be to set HISTFILE by default to either
$HOME/.bash_history.d/{some-pattern-here} (if the directory exists) or
~/.bash_history (matching the current behaviour when that directory does
not exist). I would recommend that the pattern include most or all of $$,
$TTY, $LOGNAME, and $((EPOCHSECONDS-SECONDS)).

Lastly, an awful lot of "default behaviour" is down to whatever
/etc/skel/.bashrc and /etc/bash/bashrc that are shipped with Bash by the
various distros. Maybe Bash should start shipping some kind of "standard
library" of functions that are *expected* to be included with any distro,
but are not actually built into the binary.

-Martin

PS: complaining about "inelegant" in relation to Bash seems a bit pointless.

On Tue, 20 Aug 2024 at 16:48,  wrote:

> I wouldn't consider dozens of stackoverflow/askubuntu/etc complaints of
> missing/disappearing history "cherry-picked".  There were far more than
> I sent.
>
> I understand not wanting to pull the rug out from under people, but the
> kludges Kealey posted were inelegant.  An opt-in for the suggested
> behavior would be good enough.
>
> JS
>
> On 2024-08-20 2:17 am, Lawrence Velázquez wrote:
> > On Tue, Aug 20, 2024, at 1:42 AM, supp...@eggplantsd.com wrote:
> >> The suggestion is that the default behavior needs some work
> >
> > The default behavior is unlikely to change.  For every cherry-picked
> > example of someone unsatisfied with it (bugs aside), there is likely
> > someone else who prefers it as is (or at least would not appreciate
> > it changing out from under them).  New shopt settings may be doable.
>
>


Re: Bash History Behavior Suggestion

2024-08-19 Thread Martin D Kealey
sorry, I meant HISTTIMEFORMAT rather than HISTTIMEFMT

On Tue, 20 Aug 2024 at 14:58, Martin D Kealey 
wrote:

> The following suggestions, or close approximations, can all be implemented
> using the existing facilities.
>
> On Tue, 20 Aug 2024 at 05:52,  wrote:
>
>> I would suggest:
>>
>> 1. Append to history file immediately on each command.
>>
>
> Easily done by putting `history -a` into `PROMPT_COMMAND`
>
> 2. Restrict up-arrow completion to the history of present session.
>>
>
> That's easy. Simply don't use `history -r` in your .bashrc or
> /etc/bash/bashrc.
>
> (Unfortunately modifying the latter will require admin access to your
> host, so choose a distro that does NOT include `history -r` among its
> system-wide shell start-up files.)
>
> 3. Add column(s) to the history file to identify the session the command
>> came from (pty, pid, etc).
>>
>
> I simply write the history for each session into a separate file; I have
>
>  HISTFILE=$HOME/.bash_history.d/$EPOCHSECONDS.$TTY.$$
>
> That way I can simply use a pager such as `less` to read the file I'm
> interested in. If I want to see the timestamps, I can use:
>
>   ( HISTTIMEFMT="%F,%T " HISTFILE={other-history-file} ; history -c ;
> history -r ; history ) | less
>
> 4. Add options to the 'history' command to toggle between session-local
>> and global reporting.
>>
>
> I simply use separate commands to view the current session's history vs
> all sessions.
> I generally prefer not to interleave multiple sessions, but on the rare
> occasion when I do want this, I can simply use:
>
> ( cd $HOME/.bash_history.d ; HISTTIMEFORMAT="%F,%T " ; for  HISTFILE in *
> ; do ( history -c ; history -r ; history ) ; done ) | sort | less
>
> If I did this often enough to actually care, I'd wrap it in a function.
>


Re: Bash History Behavior Suggestion

2024-08-19 Thread Martin D Kealey
The following suggestions, or close approximations, can all be implemented
using the existing facilities.

On Tue, 20 Aug 2024 at 05:52,  wrote:

> I would suggest:
>
> 1. Append to history file immediately on each command.
>

Easily done by putting `history -a` into `PROMPT_COMMAND`

2. Restrict up-arrow completion to the history of present session.
>

That's easy. Simply don't use `history -r` in your .bashrc or
/etc/bash/bashrc.

(Unfortunately modifying the latter will require admin access to your host,
so choose a distro that does NOT include `history -r` among its system-wide
shell start-up files.)

3. Add column(s) to the history file to identify the session the command
> came from (pty, pid, etc).
>

I simply write the history for each session into a separate file; I have

 HISTFILE=$HOME/.bash_history.d/$EPOCHSECONDS.$TTY.$$

That way I can simply use a pager such as `less` to read the file I'm
interested in. If I want to see the timestamps, I can use:

  ( HISTTIMEFMT="%F,%T " HISTFILE={other-history-file} ; history -c ;
history -r ; history ) | less

4. Add options to the 'history' command to toggle between session-local
> and global reporting.
>

I simply use separate commands to view the current session's history vs all
sessions.
I generally prefer not to interleave multiple sessions, but on the rare
occasion when I do want this, I can simply use:

( cd $HOME/.bash_history.d ; HISTTIMEFORMAT="%F,%T " ; for  HISTFILE in * ;
do ( history -c ; history -r ; history ) ; done ) | sort | less

If I did this often enough to actually care, I'd wrap it in a function.


Re: please make the commit log clean

2024-08-19 Thread Martin D Kealey
On Mon, 19 Aug 2024 at 06:45, shynur .  wrote:

> I believe these output files should be added to `.gitignore` and generated
> during the `make` process.


Not doing so is deliberate in some cases.

In an ideal world, yes they should be generated during `make`, but that
would increase the "build toolset" that everyone would have to install,
including people who are writing code patches rather than documentation.

Otherwise, they will severely pollute the commit history, making it much
> harder for future maintainers to understand and manage the repository.
>

As long as the generated files start with a "generated from" comment, it's
not really a great imposition on developers.
There are much bigger learning issues than managing the documentation.

Perhaps a compromise would be to put the documentation in a directory
that's not inside the source code directory, so it's easier to `git diff`
just one or the other.  (In practice, that would mean moving some of the
code into a new subdirectory.)

-Martin


Re: Potentially misleading documentation of SECONDS variable

2024-08-18 Thread Martin D Kealey
The fundamental problem of using phrases like "the run time of the current
process" is that there's NO POSSIBLE adjectival qualifier that can be added
to such a phrase such that the combination correctly describes the actual
operation.

What's needed is a statement that the value of SECONDS is the  current
system time, minus the system clock when the process started or a value was
last assigned, plus whatever value was assigned (if any), with each of
reading of the system clock taken at whole second resolution.

-Martin

On Thu, 15 Aug 2024, 17:54 Bash-help via Bug reports for the GNU Bourne
Again SHell,  wrote:

> On 15 August 2024 08:57:42 CEST, felix  wrote:
> >The variable $SECOND won't intend to be exact to the nanoseconds!
> >
> If you have read the thread you should know that this fact is already
> established.
>
> >[...] This variable is intended to show current
> >time of execution, at SECOND resolution.
> >
> The problem I saw, and lifted here, was that this is not the case, is it?
> The examples provided earlier in the mail thread clearly show that I can
> have I script run for i.e. 0.1 second and the $SECONDS variable show 1
> second has passed. This is WRONG, by all possible interpretations. A minor
> change in the documentation would make the behaviour understandable and
> acceptable, which I think is a good way forward.
>


Re: whats wrong , exit code 11 on android termux

2024-08-09 Thread Martin D Kealey
Sorry, that was supposed to be a personal reply off-list.

On Sat, 10 Aug 2024 at 12:01, Martin D Kealey 
wrote:

> On Thu, 8 Aug 2024 at 03:14, alex xmb sw ratchev 
> wrote:
>
>> mr chet
>>
>
> I REALLY get annoyed when strangers call me "Mister Martin" or write "Mr
> Martin". I am NOT a child, so how DARE they mock me like that.
>
> The short version: Some folk don't care, others don't know any better, but
> if you suspect the person you're talking to is over 40, I would strongly
> recommend you avoid this style of address - unless of course you WANT to
> mock them.
>
> The longer version: My family name is "Kealey", and my given name is
> "Martin". Only small children (or adults too young to know better) allow
> themselves to be called by an honorific with their first name, unless the
> two people are very close friends or family.
>
> If you really need to be formal, write "Mr Kealey", but otherwise just
> call me "Martin". (*1)
>
> If anyone tells you that honorific+given name is the preferred polite way
> to talk to older people, tell them you know someone whose native language
> is English who says that what they're saying is "polite" is actually an
> insult to older people, and if they still don't believe you, tell them to
> contact me directly.
>
> -Martin Kealey
>
> *1: if you suspect that the person is over 70, you'd best avoid their
> given name entirely, and stick to honorific+surname.
>


Re: whats wrong , exit code 11 on android termux

2024-08-09 Thread Martin D Kealey
On Thu, 8 Aug 2024 at 03:14, alex xmb sw ratchev  wrote:

> mr chet
>

I REALLY get annoyed when strangers call me "Mister Martin" or write "Mr
Martin". I am NOT a child, so how DARE they mock me like that.

The short version: Some folk don't care, others don't know any better, but
if you suspect the person you're talking to is over 40, I would strongly
recommend you avoid this style of address - unless of course you WANT to
mock them.

The longer version: My family name is "Kealey", and my given name is
"Martin". Only small children (or adults too young to know better) allow
themselves to be called by an honorific with their first name, unless the
two people are very close friends or family.

If you really need to be formal, write "Mr Kealey", but otherwise just call
me "Martin". (*1)

If anyone tells you that honorific+given name is the preferred polite way
to talk to older people, tell them you know someone whose native language
is English who says that what they're saying is "polite" is actually an
insult to older people, and if they still don't believe you, tell them to
contact me directly.

-Martin Kealey

*1: if you suspect that the person is over 70, you'd best avoid their given
name entirely, and stick to honorific+surname.


Re: Incorrect positioning when long prompt contains ANSI escape sequences + UTF-8 LANG

2024-08-09 Thread Martin D Kealey
HI Gioele

Typically problems with the prompt are because the \[ and \] are misplaced
or completely missing, but in this case the bug report indicates that they
have indeed been used correctly; so thankyou for checking that first.

The fact that characters are all printed in the same place (over each
other) leads me to suspect mis-handling of auto-margins, which is also
known to cause problems with the prompt.

Does this problem occur with other versions of Bash?
With other versions of libreadline.so?
With other terminal emulators?

Please let us know which ones you've tested, and which of them do or don't
exhibit the problem.

If the problem persists with all permutations, please also:

1. report the output from these commands:

echo term=$TERM
e=( enabled disabled )
tput am ; echo right-auto-margin=${e[$?]:-$?}
tput bw ; echo left-auto-margin=${e[$?]:-$?}
tput sam ; echo semi-auto-margin=${e[$?]:-$?}

2. report the active auto-margin setting in your terminal emulator, at the
point where you observe the issue (and before entering or erasing anything)

If using Xterm, use the ctrl-middle-button menu and note whether the
"Enable Auto Wraparound" and "Enable Reverse Wraparound" settings are
enabled. If using some other terminal emulator, consult its documentation
for the corresponding settings.

-Martin

On Fri, 9 Aug 2024 at 05:40, Gioele Barabucci  wrote:

> Hi,
>
> bash 5.2.21 produces severely wrong artifacts under the following
> conditions:
>
> * the length of the prompt matches $COLUMN*2 + 1;
> * the prompt contains ANSI escape sequences;
> * the LANG variable is set to an installed UTF-8 locale.
>
> When all these conditions are met, pressing the up arrow/down arrow will
> place the cursor in the wrong spot. After that, all typed character will
> be shown in the same place, overwriting each other.
>
> To reproduce:
>
>  $ set LANG=C.UTF-8
>  $ PS1=$(eval "printf x%.0s {1..$((COLUMNS*2-1))}")$'\[\e[0m\]\$ '
>  (press up arrow)
>  (type anything; text ends up in the wrong place)
>  $ unset LANG
>  (press up arrow)
>  (type anything; text is displayed correctly)
>
> Extracted from https://bugs.debian.org/1018851
>
> Regards,
>
> --
> Gioele Barabucci
>
>


Re: Bogus (intptr_t) casts

2024-08-06 Thread Martin D Kealey
Why just those ones? Mainly:
(a) I'm looking at patching that area of the code for other reasons, so
they're the ones that I happened to encounter; and
(b) I didn't want to over-cook it, so I only included the ones where I
could see that it was actually a pointer (casting a number to an intptr_t
doesn't result in UB).

Other cases that involve casting a pointer to an intptr_t or uintptr_t, and
then comparing against a *numeric* zero should be similarly updated.

To my knowledge all current compilers use a numeric zero to represent
NULL,  but this is not guaranteed, and might change in the future.

-Martin


On Tue, 6 Aug 2024, 01:17 Chet Ramey,  wrote:

> On 8/1/24 4:12 AM, Martin D Kealey wrote:
>
>
> > It follows that the following assertions are allowed to fail:
> >
> >intptr_t i = 0;
> >assert(*(void*)i == (void*)0*);
> >void *p = 0;
> >assert(*(intptr_t)p == 0*);
> >
> > Accordingly I provide the following patch:
>
> I'm wondering why you chose these two cases, since there are other very
> similar uses of intptr_t casts.
>
> >
> > diff --git a/*subst.c* b/subst.c
> > index 37e0bfa7..140a3a92 100644
> > --- a/subst.c
> > +++ b/subst.c
> > @@ -6875,7 +6875,7 @@ uw_restore_pipeline (void *discard)
> >   static void
> >   uw_restore_errexit (void *eflag)
> >   {
> >
> > *-  change_flag ('e', (intptr_t) eflag ? FLAG_ON : FLAG_OFF);+
> change_flag
> > ('e', eflag ? FLAG_ON : FLAG_OFF);*
> > set_shellopts ();
> >   }
> >
> > diff --git a/*variables.c* b/variables.c
> > index cd336c85..d44453fe 100644
> > --- a/variables.c
> > +++ b/variables.c
> > @@ -5444,7 +5444,7 @@ pop_scope (void *is_special)
> > FREE (vcxt->name);
> > if (vcxt->table)
> >   {
> >
> > *-  if ((intptr_t) is_special)+  if (is_special)*
> >  hash_flush (vcxt->table, push_builtin_var);
> > else
> >  hash_flush (vcxt->table, push_exported_var);
>
> You might want to look at the unwind-protect implementation, which doesn't
> use assignments. It uses byte copies, so instead of using an assignment of,
> say, 0, where the compiler can assign whatever it wants to denote a NULL
> pointer, it copies 4-8 bytes, depending on the size of an integer. The cast
> of that memory back to an intptr_t should be transparent on all reasonably
> common systems.
>
> Of course, if you can provide an example where it fails, I'll look at it
> and fix it.
>
> --
> ``The lyf so short, the craft so long to lerne.'' - Chaucer
>  ``Ars longa, vita brevis'' - Hippocrates
> Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
>
>


Bogus (intptr_t) casts

2024-08-01 Thread Martin D Kealey
Hi Chet

According to ISO/IEC 9899-2017, §6.3.2.3(3):

*“An integer constant expression with the value 0, or such an expression
cast to type void * , is called a null pointer constant. If a null pointer
constant is converted to a pointer type, the resulting pointer, called a
null pointer, is guaranteed to compare unequal to a pointer to any object
or function.”*

These constraints do *not* imply that NULL has to be implemented as
all-zero-bits, and there have been real compilers which implemented NULL as
all-one-bits values (mainly to ensure a dereference would result in a
misaligned word access, without requiring extra hardware to trap on the 0
address).

In addition, conversion between pointers and integers is permitted by
§6.3.2.3(5) & (6), but is implementation defined (and may be undefined).





*An integer may be converted to any pointer type. Except as previously
specified, the result is imple-mentation-defined, might not be correctly
aligned, might not point to an entity of the referencedtype, and might be a
trap representation.Any pointer type may be converted to an integer type.
Except as previously specified, the resultis implementation-defined. If the
result cannot be represented in the integer type, the behavior isundefined.
The result need not be in the range of values of any integer type.*

§7.20.14(1) requires the optional types intptr_t and uintptr_t, if they
exist, to provide round-tripping from and back to void* pointer values,
including NULL (meaning that the last sentence of §6.3.2.3(6) above would
not apply):




*The following type designates a signed integer type with the property that
any valid pointer to voidcan be converted to this type, then converted back
to pointer to void , and the result will compareequal to the original
pointer:*
>
> *intptr_t*
>



*The following type designates an unsigned integer type with the property
that any valid pointerto void can be converted to this type, then converted
back to pointer to void , and the result willcompare equal to the original
pointer:*
>
> *uintptr_t*

*These types are optional.*

It follows that the following assertions are allowed to fail:

  intptr_t i = 0;
  assert(*(void*)i == (void*)0*);
  void *p = 0;
  assert(*(intptr_t)p == 0*);

Accordingly I provide the following patch:

diff --git a/*subst.c* b/subst.c
index 37e0bfa7..140a3a92 100644
--- a/subst.c
+++ b/subst.c
@@ -6875,7 +6875,7 @@ uw_restore_pipeline (void *discard)
 static void
 uw_restore_errexit (void *eflag)
 {

*-  change_flag ('e', (intptr_t) eflag ? FLAG_ON : FLAG_OFF);+  change_flag
('e', eflag ? FLAG_ON : FLAG_OFF);*
   set_shellopts ();
 }

diff --git a/*variables.c* b/variables.c
index cd336c85..d44453fe 100644
--- a/variables.c
+++ b/variables.c
@@ -5444,7 +5444,7 @@ pop_scope (void *is_special)
   FREE (vcxt->name);
   if (vcxt->table)
 {

*-  if ((intptr_t) is_special)+  if (is_special)*
hash_flush (vcxt->table, push_builtin_var);
   else
hash_flush (vcxt->table, push_exported_var);

-Martin


Re: if source command.sh & set -e issue

2024-07-28 Thread Martin D Kealey
On Wed, 24 Jul 2024, Greg Wooledge wrote:
> Remember how -e is defined:
>
> -e [...] The shell does not exit if the command that fails is [...] any
> command in a pipeline but the last

diff --git a/doc/bash.1 b/doc/bash.1
index cd355a3..266fe35 100644
--- a/doc/bash.1
+++ b/doc/bash.1
@@ -10327,7 +10327,7 @@ reserved words, part of any command executed in a
 or
 .B ||
 list except the command following the final \fB&&\fP or \fB||\fP,
-any command in a pipeline but the last,
+any command in a pipeline but the last (unless \fBpipefail\fP applies),
 or if the command's return value is
 being inverted with
 .BR ! .



Re: improving '{...}' in bash?

2024-07-23 Thread Martin D Kealey
On Tue, 23 Jul 2024, 15:50 Harald Dunkel,  wrote:

> Hi folks,
>
> This feels weird:
>

Did you read the manual before trying any of these?

% echo x{1,2}x
> x1x x2x
> % echo x{1}x
> x{1}x
>

Why are you trying to use a multiplier syntax when you don't have more than
one option?

Be aware that brace expansion occurs before variable expansion, so you
can't put a brace-style list in a variable and then expect it to be
expanded; brace expansion is only intended to be used with literals, and
nobody would both to write such a literal.

Besides this, the shell is *required* not to replace braces *except* for
the few express patterns described in the manual.

% echo x{1..3,5}x
> x1..3x x5x
>

That's what I would expect, yes.

I would have expected "x1x" and "x1x x2x x3x x5x".


Not when it's missing the second pair of braces; perhaps you intended to
use `echo x{{1..3},5}x`?

-Martin


Re: pwd and prompt don't update after deleting current working directory

2024-07-18 Thread Martin D Kealey
TL;DR: what you are asking for is unsafe, and should never be added to any
published version of any shell.

On Tue, 16 Jul 2024 at 17:47, David Hedlund  wrote:

> Do you think that it would be appropriate to submit this feature request
> to the developers of the rm command instead.
>

This suggestion hints at some serious misunderstandings.

Firstly, under normal circumstances two processes cannot interfere with
each others' internal states (*1) - and yes, every process has a *separate*
current directory as part of its internal state.

*Most* of that internal state is copied from its parent when it starts,
which gives the illusion that the shell is changing things in its children,
but in reality, it's setting their starting conditions, and cannot
influence them thereafter.

Secondly, *most* commands that you type into a shell are separate programs,
not part of the shell. Moreover, the *terminal* is a separate program from
the shell, and they can only interact through the tty byte stream.

Thirdly, the kernel tracks the current directory on behalf of each process.
It tracks the directory by its identity, *not* by its name. (*2) This means
that you can do this:

$ mkdir /tmp/a
$ cd /tmp/a
$ mv ../a ../b
$ /bin/pwd
/tmp/b

Note that as an efficiency measure, the built-in *`pwd*` command and the
expansion `*$PWD*` give the answer cached by the most recent *cd*, so this
should be considered unreliable:

$ pwd
/tmp/a
$ cd -P .
$ pwd
/tmp/b

For comparision, caja (file manager in MATE) is stepping back as many
> directories as needed when it is located in a directory that is deleted in
> bash or caja.
>

Comparing programs with dissimilar purposes is, erm, unconvincing.

Caja's *first* purpose is to display information about a filesystem.
To make this more comprehensible to the user, it focuses on one directory
at a time. (*3)

Critically, every time you make a change, it shows you the results before
you can make another change.

That is pretty much the opposite of a shell.

Bash (like other shells) is primarily a scripting language and a command
line interface, whose purpose is to invoke other commands (some of which
may be built-ins (*4)). The shell is supposed to *do* things *without*
showing you what's happened. If you want to see the new state of the
system, you ask it to run a program such as `*pwd*` or `*ls*` to show you.
(*5)

Currently if a program is invoked in an unlinked current directory, most
likely it will complain but otherwise do nothing.
But if the shell were to surreptitiously change directory, a subsequent
command invoked in an unexpected current directory could wreak havoc,
including deleting or overwriting the wrong files or running the wrong
programs, and with no guarantee that there will be any warning indications.

All that said, if you want to risk breaking your own system, feel free to
add the relevant commands to `*PROMPT_COMMAND*` as suggested by other folk.

-Martin

*1: Okay, yes there are debugging facilities, but unless the target program
is compiled with debugging support, attempting to change the internal state
of the other program stands a fair chance of making it crash instead. You
certainly wouldn't want "rm" to cause your interactive shell to crash. And
there are signals, most of which default to making the target program
*intentionally
*crash.

*2: Linux's */proc/$pid/cwd* reconstructs the path upon request. Only when
it's deleted does it save the old path with "(deleted)" appended.

*3: It's not even clear that this focal directory is the kernel-level
current directory of the Caja process, but it probably is. I would have to
read the source code to verify this.

*4: mostly *regular* built-ins that behave as if they were separate
programs; not to be confused with *special* built-ins, which can do things
to the shell's internal state.

*5: Even if the shell's prompt includes its current directory - which isn't
the default - it could be out of date by the time the user presses *enter*
on their next command.


Re: proposed BASH_SOURCE_PATH

2024-07-08 Thread Martin D Kealey
On Mon, 8 Jul 2024 at 14:42, Oğuz  wrote:

> On Monday, July 8, 2024, Martin D Kealey  wrote:
>>
>> It's not possible to change "${BASH_SOURCE[@]}" without breaking some
>> existing code,
>>
>
> It's worth breaking existing code in this case.
>

The only things that the shell has going for it is that it's widely
deployed and stable over the long term.

Otherwise it's a terrible language, and any sane programmer should avoid it
entirely:

   - its syntax resembles no other language, with fun quirks such as
   intentionally mismatched brackets;
   - its lexical tokenization depend on at least 5 different quoting styles;
   - text may or may not be evaluated as a numeric expression, based on
   flags set elsewhere with dynamic duration;
   - text may or may not be split into "words" based on delimiters set
   elsewhere with dynamic duration;
   - text may or may not be globbed into matching filenames, yet again
   depending on a dynamic switch;
   - lifetimes for different kinds of entities are controlled by 3
   different overlapping scoping rules;
   - processes are classified and grouped in arcane ways, leading to the
   current complaints about the lifetime of output command substitutions.

If you take away stability then existing code breaks. When that happens
enough times, people get fed up and either rewrite the code in another
language, or completely replace it with a different project. When that
happens enough, there's no point including Bash in the base set for a
distro, so it's no longer universally available.

This has already been happening, and Bash is >this< close to become an
irrelevant historical footnote.

If you modify Bash in ways that are not backwards compatible, you're then
writing in a new language that no new project is likely to adopt.

which leaves us with some kind of explicit opt-in such as:
>>
>
> `shopt -s compat52' should suffice to opt out of the new default. No point
> in making it more complicated than that.
>

That is how we got into the current mess: by assuming that "someone" will
go around and adjust all the already-deployed scripts, by adding a
"compatNN" option that did not exist when the script was written.

For example, I have a Ubiquiti ER-X router, as do several of my friends and
family.
This device has Bash supplied by the vendor. If the vendor ever pushes a
future version of Bash with breaking updates, even though they will have
fixed *their* scripts, my internet connection will die before I find out
that I need to patch the scripts I've installed in it. And then I have to
go track down the other people who've installed copies of my scripts, and
get them to update them (which will be difficult if it has broken their
internet).

That's what "worth breaking existing code" costs in reality: other people's
stuff breaks when they've had zero advance notice, because they aren't the
people deciding to upgrade Bash.

-Martin

PS: this situation would be somewhat ameliorated if it were possible to use
shopt -s compat$CURRENT_BASH_VERSION, so that it won't need modifying to be
compatible with a future release of Bash. Having to wait until the next
version of Bash is released before it can be patched to say what version it
needs is cruel.

PPS: In my opinion, the only hope for Bash to continue to exist in the long
term is for it to either:
(a) absolutely guarantee stability, forsaking *all* new features; or
(b) adopt a full suite of features that make it into a "normal" programming
language, including: support for modules written for different versions of
Bash to safely cohabitate in a single script; lexical scoping with
namespaces; being able to store references in variables, including some
kinds of handles for filedescriptors, functions, processes, and process
groups; some mechanism to perform rewriting during parsing (going well
beyond what aliases can do) so that new features can be proposed and
implemented in shell before being implemented in the C core. And all of
that while not breaking code that doesn't ask for these new features.


Re: proposed BASH_SOURCE_PATH

2024-07-07 Thread Martin D Kealey
On Mon, 8 Jul 2024, 05:23 alex xmb sw ratchev,  wrote:

> i dont get the BASH_SOURCE[n] one
> the point of prefix $PWD/ infront of relative paths is a static part of
> fitting into the first lines of the script , assigning vars
>

That's not the only use case.

Consider where you have a script that uses two independently written
libraries, each comprising a main and number of ancillary files. Each
library is installed in its own directory, but that directory isn't encoded
into the library.

The standard advice would be to add both directories to PATH (or some
stand-in such as BASH_SOURCE_PATH), however remember, these are
independently written libraries, and the same filename could be used for
files in both libraries, or the "main" script.

By far the most straightforward way to avoid this problem is to source
files using paths relative to (the directory containing) the file
containing the "." or "source" statement itself. But there is no fully
general, portable, and reliable ways to do this, since:
* "${BASH_SOURCE[0]}" might be a relative path based on somewhere in PATH
rather than $PWD, or relative to a different $PWD that's been outdated by cd
;
* "${BASH_SOURCE[0]}" might be a symbolic link into a different directory;
* The directory containing any given file might be unreachable from the
root directory (because of filesystem permissions, process restrictions
(SELinux contexts and equivalents on other OSes), version shadowing, mount
shadowing, soft unmounting, mount namespaces, and probably numerous other
reasons I haven't thought of).

While some of these are intractable, Bash itself at least has a better
chance of getting it right than having to embed screeds of boilerplate code
in every "portable" script. (The more portable/reliable the boilerplate
solution is, the larger and more complex it is, and if it involves
realpath, the slower it gets.)

It's not possible to change "${BASH_SOURCE[@]}" without breaking some
existing code, which leaves us with some kind of explicit opt-in such as:
# 1. mark the source command itself
source -r file_in_same_dir.bash

# 2. Change the default behaviour via shopt/-O
#!/bin/bash -Orelsource
source file_in_same_dir.bash

# 3. set all the forward compat options by controlling argv[0]
#!/bin/bash7
source file_in_same_dir.bash

Or else we could use a new variable such as "${BASH_SOURCE_DIR[@]}" to hold
the normalized directories (and they're slightly less work than normalizing
the whole path and then discarding the last component).

Whatever solution is chosen, I would like it to be easier for a script
author to do the right thing than to do the wrong thing. And all the better
if it could quietly fix the myriad scripts out there that assume [[ ${0%/*}
-ef . ]].

-Martin


Re: waiting for process substitutions

2024-07-03 Thread Martin D Kealey
On Thu, 4 Jul 2024, 03:21 Chet Ramey,  wrote:

> Why not just wait for all process substitutions?



> Process substitutions [...] are not expected to survive their read/write
> file descriptors becoming invalid. You shouldn't need to `wait' for them;
> they're not true asynchronous processes.
>

An exception to this would be shell scripts that are interactive, and
(would like to) use process substitutions as output filters to show stuff
to the user.

"Wait for none" is clearly unsatisfactory, but "wait for all" is also
unsatisfactory if we have intentionally backgrounded processes as well as
user interaction.

-Martin


Re: waiting for process substitutions

2024-06-30 Thread Martin D Kealey
On Sun, 30 Jun 2024 at 05:08, Zachary Santer  wrote:

> On the other hand, I'm pretty sure
> command-1 | tee >( command-2 ) >( command-3 ) >( command-4 )
> will terminate as soon as command-1 and tee have terminated, but the
> command substitutions could still be running. If you want to run
> commands like this on the command line, it still might be useful to
> know when the command substitutions have terminated.
>

This is a valid concern, and one that has vexed me too.

Broadly I've found two approaches useful:

1. have each command substitution place its own $BASH_PID somewhere that
can be retrieved by the main script, so that an explicit wait $pid will
work; and
2. create a shared output pipe whose only purpose is for something to wait
for EOF.

And then there's a hybrid approach which uses the monitoring pipe to convey
the PIDs and their respective exit statuses, which tends to look like this:

{
  save_shopt_lastpipe
  shopt -s lastpipe

  local -A seq_to_stat=()
  local -Ai seq_to_pid=()
  {
exec 4>&1 >&3-
echo >&4 S 0 $BASHPID -
cmd1 |
tee >(
echo >&4 S 1 $BASHPID -
cmd2
echo >&4 F 1 $BASHPID $?
  ) >(
echo >&4 S 2 $BASHPID -
cmd3
echo >&4 F 2 $BASHPID $?
  )
echo >&4 F 0 $BASHPID ${PIPESTATUS[0]}
  } |
  while
IFS=' ' \
read fn seq pid stat
  do
seq_to_stat[$cmd]=$stat
seq_to_pid[$cmd]=$pid
  done

  (( ${#seq_to_stat[@]} == 3 )) ||   die "Failed to capture initiation of
some command substitutions"
  [[ ${seq_to_stat[*]} != *-* ]] ||  die "Failed to capture termination of
some command substitutions"
  wait "${seq_to_pid[@]}"  # don't actually care about the statuses of the
subshells, but make sure zombies are cleaned up

  restore_shopt_lastpipe
  # fill up return PIPESTATUS...
  ( exit ${seq_to_stat[0]} ) |
  ( exit ${seq_to_stat[1]} ) |
  ( exit ${seq_to_stat[2]} )
} 3>&1

That's ~30 static lines plus 4 lines for each substitution; hardly ideal
but it does work without needing any more features added to Bash.

(Realistically, proper error handling would be longer than this anyway, so
it's probably not *that* verbose.

-Martin


Re: feature suggestion: ability to expand a set of elements of an array or characters of a scalar, given their indices

2024-06-29 Thread Martin D Kealey
On Fri, 28 Jun 2024, 18:31 Oğuz,  wrote:

> On Friday, June 28, 2024, Martin D Kealey  wrote:
>
>> modern Perl scripts
>>
>
> No such thing.
>

For the purpose of this argument, "modern" means anything written in the
last 25 years, targeting Perl 5 rather than Perl 4.

Perl is a dead language,
>

Whether you think Perl is dead, or indeed whether Perl is actually dead,
doesn't affect the validity of my point: it's a historical precedent
demonstrating that it's possible and practical to get rid of insane
behaviour from a language.

and for good reason.
>

If "good reasons" were actually sufficient to kill off a language, the
shell would have died before 2000, and PHP would have been stillborn.

Even the things that Perl did wrong could help guide us in better
directions.

-Martin

PS:

Some folk think Perl is "hard" and/or "ugly" because it doesn't look like
languages they're used to. Guess what: that applies to all languages. Try
reading MATLAB or LISP or Prolog or PostScript or YACC. Or Thai or Cherokee
or Tok Pisin.

Some folk hate Perl's sigles because they hate punctuation generally. (But
then I don't know why they would tolerate the shell, much less like it.)

Some folk hate that Perl has more than one way to do any given task. Some
folk love Perl for exactly that reason.

Mostly, younger folk have been told that "Perl is dead and for good
reason", so they avoid it without even trying to make their own assessment.
Perl has moved on a long way since 1995. By contrast the Shell has
stagnated, yet it has moved just enough not to have the benefit of
stability.

>


Re: feature suggestion: ability to expand a set of elements of an array or characters of a scalar, given their indices

2024-06-27 Thread Martin D Kealey
On Thu, 27 Jun 2024, 17:08 Oğuz,  wrote:

> On Thursday, June 27, 2024, Martin D Kealey 
> wrote:
>
>> [...]
>
>
> That's too much to read
>

You're under no obligation to read what I write, but then kindly don't
pretend that you're "replying" to me.

Perl is not a good example to follow.
>

Perl isn't a perfect language, but it's an immense improvement over the
Shell language, even with Bash's enhancements, and more to the point, it's
an example that proves that it's possible to evolve and rebuild a language
to escape from various archaic crazinesses.

In particular modern Perl scripts no longer use dynamic scoping, unquoted
words that may or may not be string literals, or magic variables to
globally tweak behaviours. (Those things are still there, but it's possible
to avoid using them because there are better ways to get the same results.)

Why not extend the arithmetic expansion syntax to allow generating multiple
> results when subscripting indexed arrays?
>

Why limit this to subscripts?
Why not use that for generating lists directly?

Like `${a[1; 2; 4]}', `${a[3..5; 7]}', `${a[1..10..3]}', etc. These would
> expand like `$@' when in double quotes and like `$*' when being assigned to
> a variable.
>

Why limit this to numeric indexing?
Why not support associative arrays?

-Martin


Re: feature suggestion: ability to expand a set of elements of an array or characters of a scalar, given their indices

2024-06-26 Thread Martin D Kealey
On Thu, 27 Jun 2024 at 06:30, Chet Ramey  wrote:

> On 6/26/24 2:18 PM, Zachary Santer wrote:
>
> >> On Tue, Jun 11, 2024, 12:49 PM Zachary Santer 
> wrote:
> >>>
> >>> $ array=( zero one two three four five six )
> >>> $ printf '%s\n' "${array[@]( 1 5 )}"
> >>> one
> >>> five
> >
> > This is different functionality.
>
> Equivalent to printf '%s\n' "${array[1}" "${array[5]}". The innovation Zach
> wants is to have a single word expansion to do this.
>

Surely the point is to handle the case where we don't know in advance how
many elements will be wanted.

In effect, it would mimic Perl's @array[@indeces] and @hash{@keys}
functionality, where we supply an arbitrary list of indices or subscripts,
and get back the corresponding values.

Using the proposed syntax we would be able to write:

array=( '' one two three four five six )
indices=( 1 0 6 7 5 )
printf '%s, ' "${array[@]( "${indices[@]}" )}"
printf end\\n

to get

one, , six, five, end

(Note that there are only 4 words resulting from the expansion, since there
is no element '7' in 'array'. Unfortunately - and unlike Perl - Bash
doesn't have "undef", so we have to make do with getting back fewer values
in the resulting list if some requested array elements are unset, or if
some indices exceed the size of the array.)

I agree that this syntax looks ugly, but since [@] and [*] don't function
as subscripts, it's tricky to improve on.

My suggestion would be to generalise, turning [@] and [*] into fixed
syntactic tokens that can be combined with "ordinary" subscripting, or left
without subscripts to retain their current meanings:

  "${array[*][index]}"   (a long-hand version of "${array[index]}")
  "${array[@][index]}"   (gives "${array[index]}" if it exists, but is
completely elided if it doesn't - similar to how "$@" can result in no
words, not an empty word)

Or maybe we can have some mechanism so that '@[' doesn't get treated as the
start of an '@' modifier; and we could use:

  "${array*[index]}"
  "${array@[index]}"

(For the rest of this discussion I'm just going to mention the '@' form;
please infer the corresponding '*' form.)

After doing this, I would start working on syntaxes for list-slicing in
various ways, perhaps:

  "${array@[[ list of indices ]]}"

"list of indices" is an ordinary word list; it's split up at unquoted $IFS,
then each of the resulting words is used as a subscript.

I would also revamp how numeric range slices are done (*1):

  "${array@[ start_index : count ]}"
  "${array@[ start_index ... end_index ]}

For all of these expansions, where each subscripted element of the array
exists, it provides a 'word' in the resulting expansion, and where it
doesn't exist, no word is provided.
With '@', the list is kept as separate words despite being quoted; with
'*', the resulting list is joined in the traditional manner.

But I would look even further ahead...

Firstly, I acknowledge Bash has had to comply with historical expectations,
POSIX requirements, and precedent set by ksh.
However, having an array subscript expansion change its behaviour based on
whether or not a "declare -A" statement has been executed, possibly in a
different function or even a different file; that is - by modern standards
at least - a rather poor language design choice. (*1)

I'm talking about whether the subscript undergoes arithmetic expansion.

So I also propose that we should follow Perl in having separate array
indexing and map subscripting syntaxes, so that it's no longer necessary to
use "declare -A", and more to the point, no longer necessary to go look for
it while reading someone else's code. (*2)

(I'm about to suggest some syntax, but the exact form isn't really my main
point; what's really important is that you would be able to read a $
expansion and tell at a glance whether the subscript will be subject to
arithmetic expansion. (*3))

As a secondary issue, deferring *parsing* of arithmetic expressions (until
the containing command is executed) obscures syntax errors, delays their
reporting, and degrades performance. I would change that, either globally
when « shopt -s early_math_parse » is in effect, or in recognized contexts
like this new array indexing syntax. (*4)

When using the new array indexing syntax, the index would be parsed as an
arithmetic expansion while the surrounding commands are being parsed (*5)
(and thus ALWAYS evaluated as a numeric expression), and when using the map
subscripting syntax it would NEVER be subject to arithmetic expansion.

One possible syntax would be:

  "${assoc_array@{key}}"
  "${assoc_array@{{list of keys}}}"

which would differ from the previous in that 'key' and 'list of keys' would
be guaranteed NOT to undergo numeric expansion; importantly, this can be
determined at parse time without needing to have executed a 'declare -A'
statement. (This becomes more important if we look to eventually
implementing lexically scoped variables some time in the future.)

If you really can't stomach using {} around subs

Re: proposed BASH_SOURCE_PATH

2024-06-26 Thread Martin D Kealey
I've found some existing code that will break if words in ${BASH_SOURCE[@]}
don't match the filepath given to '.' or 'source':
  [[ ${BASH_ARGV[0]} = "${BASH_SOURCE[0]}" ]]
which is part of a test to determine whether any args were provided after
"source filename".

I use this at the end of my file called «autoload.bash», which after
defining a function called «autoload», has:

if [[ -n "$*" ]] && ! {
> (( BASH_ARGC[0] == 1 )) && [[ ${BASH_ARGV[0]} = "${BASH_SOURCE[0]}" ]]
> ||
> (( BASH_ARGC[0] == 0 && ${#BASH_ARGC[@]} > 0 )) # in case this gets
> fixed sometime
>}
> then
> autoload -- "$@"
> fi


This is trying to figure out whether any arguments were provided after:

. "$path/autoload.bash"
>

It can't simply look at "$@" because when *no* args are given, the outer
args are left bound to "$@" (and can be modified by shift & set).

But unfortunately in this case, BASH_ARGC[0] is 1 rather than 0,, and the
filename provided to "source" (now ${BASH_SOURCE[0]}) is prepended to
BASH_ARGV. That's where the test for BASH_SOURCE[0] == BASH_ARGV[0] is
needed. It's still only probabilistic, because BASH_ARGV & BASH_ARGC have
some other weird behaviours, but it has a higher likelihood of being
correct, and in the cases where it's giving a false nagative, the
parameters would be invalid for my use case.

This test will clearly break if BASH_SOURCE does not contain exactly the
parameter given to "source" (or ".").

However this kludge is only necessary because of the pollution of
BASH_ARGC/BASH_ARGV, and my test will work correctly if that pollution is
removed.

I would be happy to always have $(realpath $0) or $(realpath
$sourced_filename) in BASH_SOURCE if there was also a concomitant change to
preface '0' onto ARGC (and not change BASH_ARGV) when a file is sourced (or
a function is called) without any args. However I worry that the latter
change might adversely affect someone else's code.

-Martin



On Thu, 27 Jun 2024 at 00:17, Martin D Kealey 
wrote:

>
>
> On Wed, 26 Jun 2024, 03:14 Chet Ramey,  wrote:
>
>> On 6/19/24 6:12 PM, konsolebox wrote:
>>
>> > Alternatively, have BASH_SOURCE always produce real physical paths
>> > either by default or through a shopt.
>>
>> This is the best option. I don't think changing bash to do this by
>> default would have negative side-effects.
>>
>
> Just to be clear, would this result in $0 and ${BASH_SOURCE[@]:(-1):1}
> potentially yielding different values?
>
> (I would certainly hope so, otherwise how would we be able to alter the
> behaviour of a script based on the name used to invoke it? And before
> anyone tells me what a bad idea that would be, sometimes it's necessary to
> mimic historical behaviour such as gzip vs gunzip, and there are definitely
> scripts out there that rely on this.)
>
> However I do see a minor downside: looking at BASH_SOURCE to decide
> whether a given ancestor in the call chain is actually $0, the naive string
> comparison will now fail. This isn't important to most programs, but may be
> necessary for, say, a debugging framework.
>
> I have a small pure-Bash library that mimics Perl's "Carp.pm", providing
> "carp" , "cluck", "croak", and "confess". One feature I'm in the process of
> adding is suppression of specific "modules" from the backtrace displayed
> after a message; in this context I equate "module" with "source file".
>
> Since this is new for me, I can just document that users must use
> ${BASH_SOURCE[@]:(-1):1} rather than $0 when asking my "Carp" module to
> exclude its own "main" from backtraces.
>
> But if someone else has already implemented this, then their code will be
> subtly broken by this proposed change to Bash.
>
> -Martin
>
>>


Re: proposed BASH_SOURCE_PATH

2024-06-26 Thread Martin D Kealey
On Wed, 26 Jun 2024, 03:14 Chet Ramey,  wrote:

> On 6/19/24 6:12 PM, konsolebox wrote:
>
> > Alternatively, have BASH_SOURCE always produce real physical paths
> > either by default or through a shopt.
>
> This is the best option. I don't think changing bash to do this by default
> would have negative side-effects.
>

Just to be clear, would this result in $0 and ${BASH_SOURCE[@]:(-1):1}
potentially yielding different values?

(I would certainly hope so, otherwise how would we be able to alter the
behaviour of a script based on the name used to invoke it? And before
anyone tells me what a bad idea that would be, sometimes it's necessary to
mimic historical behaviour such as gzip vs gunzip, and there are definitely
scripts out there that rely on this.)

However I do see a minor downside: looking at BASH_SOURCE to decide whether
a given ancestor in the call chain is actually $0, the naive string
comparison will now fail. This isn't important to most programs, but may be
necessary for, say, a debugging framework.

I have a small pure-Bash library that mimics Perl's "Carp.pm", providing
"carp" , "cluck", "croak", and "confess". One feature I'm in the process of
adding is suppression of specific "modules" from the backtrace displayed
after a message; in this context I equate "module" with "source file".

Since this is new for me, I can just document that users must use
${BASH_SOURCE[@]:(-1):1} rather than $0 when asking my "Carp" module to
exclude its own "main" from backtraces.

But if someone else has already implemented this, then their code will be
subtly broken by this proposed change to Bash.

-Martin

>


Re: Proposal for a New Bash Option: failfast for Immediate Pipeline Failure

2024-06-25 Thread Martin D Kealey
Conceptually this sounds useful, but how exactly would it work?

• Is any attempt made to terminate the other processes in the pipeline, or
to you just not delay by waiting for them immediately?
  → If attempting to terminate:
- using which signal?
- what happens if the process refuses to die?
  → If allowing them to continue in the background:
- can the script learn the pids of the incomplete parts of the pipeline
- can it explicitly wait for them (as if they were ordinary background
processes, or otherwise)
- if the script doesn't wait for them, when should Bash reap them?
• Normally it's expected that if any part of a pipeline exits, all
preceding parts would eventually receive SIGPIPE upon writing to stdout.
Would you treat the parts to the left of the terminated part differently
from the parts to its right? How? Would there be any other positional
differences?
• How would this interact with pipefail or lastpipe?

-Martin

-- Forwarded message -
From: ama bamo 
Date: Tue, 25 Jun 2024, 02:21
Subject: Proposal for a New Bash Option: failfast for Immediate Pipeline
Failure
To: 


Dear Bash Maintainers,

I have encountered a challenge with the current implementation of pipelines
in Bash, specifically regarding subshells and the pipefail option. As
documented, each command in a pipeline is executed in its own subshell, and
Bash waits for all commands in the pipeline to terminate before returning a
value.

To address these issues, I propose the introduction of a new option,
failfast, which would immediately terminate the pipeline if any command in
the pipeline fails. This would streamline error handling and provide more
predictable script execution, aligning with user expectations in many
common use cases.

Here is an example illustrating the proposed behavior:


#!/bin/bash
set -o failfast
nonexisting-command | sleep 10# Pipeline exits immediately without
sleeping for 10 seconds


This would provide a more intuitive and robust error handling mechanism for
pipeline commands, enhancing the usability and reliability of Bash
scripting.

I look forward to your feedback.

Best regards,

Mateusz Kurowski


Re: proposed BASH_SOURCE_PATH

2024-06-21 Thread Martin D Kealey
I support BASH_SOURCE_PATH as replacing the normal PATH search only for "."
and "source".

In addition I propose some new '~' expansions which will give concise
expression of dirname+realpath without penalizing code that does not need
it.

The primary intention is to allow the "standard preamble" to reduced to
simply

  SOURCE_PATH=

or in more complex situations perhaps
  SOURCE_PATH+=:˜./lib:˜./../lib

so that

  . module.bash

will read "module.bash" from a location relative to the current file,
without regard for $PWD or $PATH.

The short preamble "SOURCE_PATH=" relies on a (proposed) rule that when
SOURCE_PATH contains an empty element, it is treated as equivalent to "˜.".

(It could be argued that an explicit syntax such as
SOURCE_PATH=${BASH_SOURCE[0]@R@D} would be more general, and I would favour
also implementing those modifiers, but it's also a well understood
engineering principle that we should make systems so that it's easier to do
the "best thing" or "right thing". Also unknown ˜ expansions will pass
through unchanged on any version of Bash that does not understand them,
whereas ${var@R@D} will provoke a syntax error.)

The new ˜ expansion forms are ~@[DIGITS] and ~.[DIGITS] (where [DIGITS]
denotes zero or more decimal digits). The ~@ form expands to a "partially
normalised form" of an selected path as explained below. The ~. form
expands to the dirname of the corresponding ~@ form.

I defined this "partially normalised form" as approximating what "realpath"
provides, but with weaker guarantees: the basename of the expansion is
guaranteed to be the final value after resolving symlinks, but the path
before that may be any path that is functional, especially if $PWD is
inaccessible. In particular it may not be an absolute path, or might start
with a prefix like /proc/1234/fd/99/.

With no digits, ~@ expands to a partially normalised path to the current
file, ${BASH_SOURCE[0]}.

When zero, ~@0 expands to a partially normalised $0.

Otherwise ~.NUMBER expands to a partially normalised form of
${BASH_SOURCE[NUMBER]}. (This might be omitted from the initial
implementation; we probably need more experience to see if it's actually
useful.)

-Martin

On Thu, 20 Jun 2024, 10:12 konsolebox,  wrote:

> On Thu, Jun 20, 2024 at 4:05 AM Will Allan  wrote:
> > But, I still don’t like it. I have to start off each script with a slow
> command substitution (subshell) which introduces a variable that I don’t
> really want, but it’s too slow to do this repeatedly:
>
> I agree.
>
> > source -- "${BASH_SOURCE_PATH}/../lib/foo.sh"
>
> You misunderstood the use of BASH_SOURCE_PATH though.  It's proposed
> to be an alternative to PATH which is only respected by source.  I
> suggest another name.  And that is BASH_SOURCE_REAL.
>
> Alternatively, have BASH_SOURCE always produce real physical paths
> either by default or through a shopt.
>
> Any additional feature that doesn't allow loading a script relative to
> the caller without external help is practically useless.
>
>
> --
> konsolebox
>
>


Re: REQUEST - bash floating point math support

2024-06-13 Thread Martin D Kealey
On Thu, 13 Jun 2024 at 09:05, Zachary Santer  wrote:

>
> Let's say, if var is in the form of a C floating-point literal,
> ${var@F} would expand it to the locale-dependent formatted number, for
> use as an argument to printf or for output directly. And then ${var@f}
> would go the other way, taking var that's in the form of a
> locale-dependent formatted number, and expanding it to a C
> floating-point literal.
>

How about incorporating the % printf formatter directly, like ${var@%f} for
the locale-independent format and %{var@%#f} for the locale-specific format?

However any formatting done as part of the expansion assumes that the
variable holds a "number" in some fixed format, rather than a localized
string.
Personally I think this would actually be a good idea, but it would be
quite a lot bigger project than simply added FP support.

-Martin


Re: [PATCH] tests: printf: provide explicit TZ start/end

2024-06-13 Thread Martin D Kealey
On Fri, 14 Jun 2024 at 10:52, Robert Elz  wrote:

>   | I also note a minor bug/issue with printf in Bash 5.3-alpha: the
> builtin
>   | printf treats TZ=CET-1CEST,M3.5,M10.5/3 as if it were oddly-named UTC.
>
> That's user error, POSIX format requires 3 values after the M:


"User error" is not the only possible interpretation, even if "POSIX says
something else". (And I specifically said "issue" rather than "bug".) It's
an entirely reasonable *extension* to POSIX.

I only knew that the 2-value version was possible because it was
documented *and
recommended* when I started using it in 1988, with substantially the same
description in Ultrix, SunOS, Xenix, and SCO-Unix. In 1988 I used
TZ=NZST-12NZDT,M10.5,M3.1 but the next year (1989) changed to
TZ=NZST-12NZDT,M10.1,M3.3 and that remained unchanged until 2007, when it
changed to TZ=NZST-12NZDT,M9.5,M4.1. Obviously I've only *kept* using this
shorter format because *nothing* objected; in particular the GNU "date" and
"ls" and "touch" commands interpret these abbreviated timezone rules
according to this "extension" format. (Some time around 2014 I switched to
using the "named" timezones with simply TZ=NZ or TZ=Pacific/Auckland, but
I've occasionally used the old style when I needed to test behaviour in
"odd" time zones.)

To be clear, this isn't a bug report, but rather a feature request to
implement a common extension, and to interpret TZ=...,M*m.w* printf in line
with the rest of GNU.

-Martin


Re: Poor messages when the '#!' file isn't found

2024-06-13 Thread Martin D Kealey
On Fri, 14 Jun 2024 at 06:13, Dan Jacobson  wrote:

> ./k
> make: ./k: No such file or directory
>

This is a problem with the POSIX spec for the execve system call and its
obligatory return codes.

"No such file or directory" is arguably the correct message to show when
the kernel returns the ENOENT error code. And *normally* it's appropriate
to attach that to the filename that was passed to the kernel; however in
this case the combination may be confusing.

I would argue that a more appropriate error code would be ENOEXEC ("exec
format error", which is what you get when the kernel lacks an "interpreter"
for ELF or a.out or JAR files) or perhaps a new code such as ENOINTERP.
However, this should be addressed to the POSIX committee, and to the Kernel
developers for Linux, FreeBSD, Windows, and others.

Bash's message is intentionally more vague precisely *because* this is a
known issue.

-Martin


Re: [PATCH] tests: printf: provide explicit TZ start/end

2024-06-13 Thread Martin D Kealey
On Tue, 11 Jun 2024 at 21:52, Grisha Levit  wrote:

> POSIX says about the TZ variable:
>
> If the dst field is specified and the rule field is not, it is
> implementation-defined when the changes to and from DST occur.
>
> musl seems to interpret `TZ=EST5EDT` as having DST always in effect,
> causing the tests that rely on the glibc behavior (of defaulting to
> America/New_York transition rules) to fail.
>

It appears that glibc treats the 6 US timezones not as specifications using
an implementation-defined rule, but rather references to files with names
like /usr/share/zoneinfo/PST8PDT. Perhaps the issue is that systems using
musl don't include the tzdata package?

If the tests are being updated, please can southern-hemisphere daylight
saving also be checked. For thoroughness, I suggest testing:
* Chatham Islands time TZ=NZCST-12:45NZCDT,M9.5.0/2:45,M4.1.0/3:45
* Australian NSW & Vic time TZ=AEST-10AEDT,M10.1.0,M4.1.0/3
* Australian SA time TZ=ACST-9:30ACDT,M10.1.0,M4.1.0/3
* British time TZ=GMT0BST,M3.5.0/1,M10.5.0
* Central European time TZ=CET-1CEST,M3.5.0,M10.5.0/3

I also note a minor bug/issue with printf in Bash 5.3-alpha: the builtin
printf treats TZ=CET-1CEST,M3.5,M10.5/3 as if it were oddly-named UTC.

For 30+ years it has been my experience that the '.0' for Sunday is not
required, either in practice or (I *think*) by the POSIX specification.

-Martin


Re: Examples of concurrent coproc usage?

2024-06-08 Thread Martin D Kealey
On Wed, 10 Apr 2024 at 03:58, Carl Edquist  wrote:

> Note the coproc shell only does this with pipes; it leaves other user
> managed fds like files or directories alone.
>
> I have no idea why that's the case, and i wonder whether it's intentional
> or an oversight.
>

Simply closing all pipes is definitely a bug.

This is starting to feel like we really need explicit ways to control
attributes on filedescriptors.

It should be possible to arrange so that any new subshell will keep
"emphemal" filedescriptors until just before invoking a command.

One mechanism would be to add two new per-fd attributes: *inherited-by-fork*,
and *ephemeral*.

The *inherited-by-fork* attribute would be set on any fd that's carried
through a fork (especially the implicit fork to create a pipeline) and
reset on any fd that's the result of a redirection.

The *emphemal* attribute is set on any coproc fd (or at least, any that's a
pipe to the stdin of a coproc).

Then when *both* attributes are set on an fd, it would be closed just
before launching any inner command, after any redirections have been
done.That way we could simply turn off the close-after-fork attribute on a
coproc fd if that's desired, but otherwise avoid deadlocks in the simple
cases.

-Martin


Re: Bug tracking

2024-06-08 Thread Martin D Kealey
On Tue, 2 Apr 2024 at 00:31, Chet Ramey  wrote:

> On 3/31/24 8:34 PM, Martin D Kealey wrote:
> > That's a good start, but it seems incomplete, and there's little --
> perhaps
> > no -- overlap with bug reports in this list.
>

And this is still the most fundamental problem; the submission of bug
reports, the discussion around them, and the proposed fixes currently are
split between multiple platforms that don't talk to each other. So *very*
few people actually track everything.


> > Has bashbug always sent email to bug-bash@gnu.org, or was it previously
> fed into Savannah?
> bashbug long predates savannah.
>

We are instructed to submit bug reports via bash-bug, or by posting to this
mailing list, but none of those reports reach Savannah.

When Savannah became the primary repository, bash-bug should have been
updated to post to Savannah, and/or an email receiver should have been
created to inject into Savannah's "support" queue. Likewise, details
entered directly on Savannah should be sent to the mailing list. ("Overdue"
would be an understatement.)

First impressions when I sign into Savannah:

   - There's no "dashboard" or "overview" of stuff that I personally am
   likely to need. In particular, there's no mention of any projects I'm
   trying to engage with. OK I'll try to add some:
   - The sidebar has a (relatively) obvious link "My Groups", taking me to
   "My Group Membership".
   Nope, "You're not a member of any public group".
   - The right panel has "Request for inclusion", which sounds about right.
   OK, let's search for Bash. Yay, found ... oh wait, nope, "Bash Bear Trap"
   is not "Bash".
   - Let's back up, and use the site search in the side bar.
   OK, *this* time I see "The GNU Bourne-Again SHell" (and the Bear Trap
   again, and 13 other projects).
   Let's follow that link and ... yay, found it.

At this point I save a bookmark, because that was the short-and-simple
version of what was actually a much longer and far more tedious process of
discovery.

The summary page says that this project has:

   - 3 mailing lists
   - an SVN repository (which turns out to be empty)
   - a Git repository
   - a "tech support manager" (which turns out to be a general inquiries
   queue)
   - a "patch manager" (which turns out to be a bug-reporting and
   feature-request queue)
   - 3 mailing lists

Initial impression good.
Well okay.
For all of 10 minutes, by which time I've discovered that these facilities
have no mutual integration.

It seems like the "support" queue is intended for interactions with users
and the "patches" queue is for interactions between designers, developers
and Q/A reviewers. That arrangement could have some benefits, but
unfortunately:

   - The support and patch queues don't talk to each other. Once an actual
   bug is identified from a support request, a separate "patch" issue has to
   be opened, without any automatic cross referencing. And therefore when a
   bug fix finally makes it into a release, there's no automated process to
   close any original support requests.
   - Neither the support queue nor the patches queue has any integration
   with this email list (nor, I suspect, with any others).
   - Any cross-referencing between them is up to project members to perform
   manually.
   - The git repo has a fixed set of branches created by the members; other
   users cannot create their own branches.
   - When it says "patch", it really still means an actual context diff
   text file.
   Anyone used to using a modern source management system (e.g. Piper,
   Perforce, Bitbucket, Gitlab, or GitHub) would expect "patch" to mean a git
   branch or equivalent, which can be amended (with additional commits),
   merged, or rebased, all while being subject to regression testing, and then
   merged into the "master" branch once it has QA approval. (A major effect of
   this lack is that a textual patch can "go stale" while the master branch in
   the git repo moves on, without any tracking of how far out of date it's
   become.)
   But there is no linkage between the patch queue and any git repo, nor is
   there a branch per active patch issue.
   - We cannot use ssh to connect to git.savannah.gnu.org, despite the
   instructions for cloning the repo saying we should use
   @git.savannah.gnu.org:/srv/git/bash.git rather than just
git.savannah.gnu.org:/srv/git/bash.git
   .

This combination of missing features (and outright mis-features) amounts to
the antithesis of how one should design a modern collaborative development
process.

And that's a big part of why a one-man-band running the whole show is the
only feasible support model.

> Savannah seems 

Re: [PATCH v2 5/8] builtins/source: parse the -i option

2024-05-24 Thread Martin D Kealey
On Tue, 21 May 2024 at 23:16, Koichi Murase  wrote:

> 2024年5月21日(火) 14:56 Phi Debian :
> > 'May be' bash could investigate the ksh93/zsh $FPATH autoload, but don't
> > know if that would be good enough for the initial purpose.
>
> There are already shell-function implementations at
> /examples/functions/autoload* in the Bash source. They reference FPATH
> to load functions, though one needs to call `autoload' for each
> function in advance (by e.g. `autoload "$fpath_element"/*' ).
>

My solution to this was to call 'autoload --all', which would gather all
the filenames in FPATH (*1) and create autoload stubs for them.

Alternatively one could define a commandnotfound function to defer this
until actually needed.

(*1 I actually used a different variable name, since I wasn't providing
exactly the same semantics as ksh, but that's relatively cosmetic)

However, I personally do not think the FPATH mechanism is useful
> because a file can only contain one function per file. Significantly
> non-trivial functions are usually implemented by a set of helper
> functions or sub-functions.


Defining extra (private) functions in a file loaded from FPATH does no
harm, as long as its name doesn't conflict.


> Also, in libraries, we usually have a set
> of functions that are closely related to one another and share the
> implementations. I don't think it is practical to split those
> functions into dozens or hundreds of files.


I would hesitate to call what I've done "a manager", but my approach has
been to allow a file to "declare" all the public functions it defines, and
then simply have symlinks (or even hard links) to a single underlying file.

I copied Perl and named my command "require" because it's a run-time check,
not a parse-time one.

Each file should contain a "provides" statement for each (public) function
it defines; except it's optional for a function name that matches the only
possible "require" name. Using this mechanism it's also possible to require
a module rather than an individual function.

The autoloader stubs I mentioned above is just:
  func() { require --from=/path/to/library/func.bash func && func "$@" ; }

(In order to get the correct search precedence, these would need to be
generated by scanning FPATH in reverse order, so that later ones will be
replaced by earlier ones; amongst other things, I define my own FPATH-like
var to have the reverse precedence order.)

Originally I just made autoloader stubs like
  func() { source /path/to/library/func.bash && func "$@" ; }
however it turned out that using "require" instead of "source" simplified
"provides" since it could then rely on local variables from "require" being
available.

It would also be slow to
> read many different files, which requires access to random positions
> on the disk.
>

My approach allows multiple functions per file, and only loads each file
once, no matter how many names it has.

-Martin


Re: [PATCH v2 5/8] builtins/source: parse the -i option

2024-05-21 Thread Martin D Kealey
On Tue, 21 May 2024 at 03:44, Chet Ramey  wrote:

> On 5/17/24 1:15 PM, Robert Elz wrote:
>
> >| If `nosort' means no sorting, there is no imposed ordering, and
> ascending
> >| and descending are meaningless.
> >
> > Sure, but directory order, and reverse directory order aren't (and that's
> > just a difference between the order in which you create the list as each
> > new dirent is read from the directory - does it go at the head or tail).
>
> That's changing from one random order to another.
>

In the *general* case yes, the order should be treated as random.
For example, readdir on a Linux ext4 fs returns directory entries in
pseudo-random order; this is necessary to allow successive readdir calls to
traverse a large directory, without skipping or repeating an existing
entries, and without locking (so allowing additions, removals, and
renamings to continue). (Connected to this, file positions returned by
lseek(dir_fd, 0, SEEK_CUR) are pseudorandom numbers identifying the next
directory entry to be fetched, not an indication of bytes read so far.)

But it's not *always* random.
For example, readdir on Linux's /proc/$pid/fd returns filedescriptors in
ascending numerical order; likewise readdir on /proc returns process IDs in
ascending numerical order (preceded by a bunch of other stuff, in an order
that's well-defined in the kernel though perhaps not obvious to most users).
Reversing those could be useful to some people.

-Martin


Re: [PATCH v2 5/8] builtins/source: parse the -i option

2024-05-17 Thread Martin D Kealey
On Fri, 17 May 2024 at 04:18, Chet Ramey  wrote:

> On 5/16/24 11:54 AM, G. Branden Robinson wrote:
> > At 2024-05-16T11:36:50-0400, Chet Ramey wrote:
> >> On 5/15/24 6:27 PM, Robert Elz wrote:
> >>> and any attempt to use a relative path (and you
> >>> can exclude ./anything or ../anything from that if you prefer - ie:
> >>
> >> Those are not relative paths.
> >
> > !
> >
> > POSIX 1003.1-202x/D4, §3.311 defines "relative pathname" thus:
> >
> > "A pathname not beginning with a  character."
> >
> > Can you clarify?  Does Bash have its own definition of this term?
>
> In this specific case, I suppose. In default mode, `source' doesn't use
> $PATH for ./x and ../x, but does for other relative pathnames.
>

I assumed that "default mode" means "not posix mode", but if so that
doesn't hold up:

$ mkdir tmp/a
$ cat >tmp/a/b
echo in B
$ ( PATH=$PWD/tmp/a source b )
in B
$ ( PATH=$PWD/tmp source a/b )
bash-latest: a/b: No such file or directory
$ echo $BASH_VERSION
5.3.0(2)-alpha
$ ( p=$( realpath "$0" ) ; echo "git commit ${p##*/bash/}" )
git commit aadb6ffb93359891760c58008539f549f06c5140/bin/bash
$ shopt -o posix
posix   off

-Martin


Re: proposed BASH_SOURCE_PATH

2024-05-16 Thread Martin D Kealey
On Thu, 16 May 2024 at 03:03, Chet Ramey  wrote:

> On 5/14/24 2:08 AM, Martin D Kealey wrote:
> > I wholeheartedly support the introduction of BASH_SOURCE_PATH, but I
> would
> > like to suggest three tweaks to its semantics.
> >
> > A common pattern is to unpack a script with its associated library &
> config
> > files into a new directory, which then leaves a problem locating the
> > library files whose paths are only known relative to $0 (or
> > ${BASH_SOURCE[0]}).
>
> That assumes a flat directory structure for the script and its associated
> files, correct? How common is that really? Or is it more common to have
> something like the script in somewhere/bin, files to be sourced in
> somewhere/lib, and so on?
>

On the contrary, I would expect a typical setting to be something like
 BASH_SOURCE_PATH=../lib:../config
Or alternatively, that people will write:
 source -i ../lib/module.bash
 source -i ../config/theproject.conf
making use of the implicit '.' when BASH_SOURCE_PATH is unset or empty.


> > 1. I therefore propose that where a relative path appears in
> > BASH_SOURCE_PATH, it should be taken as relative to the directory
> > containing $0 (after resolving symlinks), rather than relative to $PWD.
>
> Is this pattern really common enough to break with existing behavior like
> you propose?
>

It's something that people try to do often enough that there's a HOWTO for
it Greg's Bash FAQ, and a bot auto-response in ircs:irc.libera.chat#bash.

And sadly, people do indeed often make scripts that are brittle or outright
broken because they assume $(dirname $0) == '.'
Just search for how many shell scripts suggest using './name' to invoke
them.

To be fair, the commonest case is looking for a "configuration file",
rather than a collection of modules

But yes, unpacking any tarball or cloning any git repo will result in a
directory tree grafted to a random point in the filesystem, and it'll stay
that way if it doesn't have an explicit installation procedure (such as
"make install").


You also suggested not having '-i' and just enabling the new behaviour when
BASH_SOURCE_PATH is set.
I strongly disagree with this for two reasons.
(1) it's barely tolerable to add more action-at-a-distance by introducing a
new variable; but adding *invisible* action at a distance is a language
design antipattern. Having `-i` on the `source` command documents that new
behaviour is expected.
(2) We want it to fail with 'source: can't file "-i"' when run on a version
of Bash that can't provide this suppression.

-Martin


Re: proposed BASH_SOURCE_PATH

2024-05-16 Thread Martin D Kealey
On Thu, 16 May 2024 at 02:48, Koichi Murase  wrote:

> 2024年5月14日(火) 15:09 Martin D Kealey :
> > 1. I therefore propose that where a relative path appears in
> > BASH_SOURCE_PATH, it should be taken as relative to the directory
> > containing $0 (after resolving symlinks), rather than relative to $PWD.
>
> [...]
>
However, I think what is generally achieved by proposal 1 would be
>
>   source "$__dir__/ BASH_SOURCE_PATH>/xxx/yyy/libzzz.bash"
>
> This might be useful when the middle path elements of the library
> location are ambiguous, yet the candidates are common with different
> $__dir__. However, I don't have an idea about whether this has a
> significant demand. What would be the use case? I naively think only
> `.' is normally useful for the suggested interpretation of relative
> paths.
>

That's fairly accurate.
I would expect there would normally be only one, but the utility is that it
can go anywhere between absolute paths, not just first or last.
It also leads automatically to (what I consider) the optimal result when
BASH_SOURCE_PATH is empty or unset.

A more sophisticated usage might be to load both "modules" and
"configuration settings" the same way:

BASH_SOURCE_PATH=../lib:/usr/share/fubar/bash-lib:${XDG_CONFIG_HOME:-$HOME/.local/config}/fubar/config:../config:/usr/share/fubar/config
source -i module1.bash
source -i custom_module.bash
source -i fubar.config

-Martin


Re: proposed BASH_SOURCE_PATH

2024-05-14 Thread Martin D Kealey
On Tue, 14 May 2024 at 20:10, konsolebox  wrote:

> On Tue, May 14, 2024 at 2:09 PM Martin D Kealey 
> wrote:
> > 2. Search BASH_SOURCE_PATH when any relative path is given, not just a
> path
> > that lacks a '/', so that libraries can be organized into subdirectories.
>
> I disagree with this.  Paths beginning with ./ or ../ should be
> considered explicit and not searched in BASH_SOURCE_PATH.
>
> It should use the directory of the calling script as reference when
> using -i or $PWD when not.
>

For the particular cases of './' and '../' that seems reasonable when the
fallback is the location of the script (proposal 4), but in general I would
prefer "source -i foo/bar/zing" to honour BASH_SOURCE_PATH.

I'm concerned that doing both would introduce an entirely new dichotomy for
programmers to have to remember, so perhaps "skip path searching" should be
controlled by a separate switch, perhaps '-s'?

Yes one could instead write "BASH_SOURCE_PATH='' source -i ...", but that
would mess up the search path for inner source commands, and avoiding that
is one of the reasons for doing this in the first place. (And yet again I'm
wishing that "local" could be used outside functions.)

-Martin


proposed BASH_SOURCE_PATH

2024-05-13 Thread Martin D Kealey
I wholeheartedly support the introduction of BASH_SOURCE_PATH, but I would
like to suggest three tweaks to its semantics.

A common pattern is to unpack a script with its associated library & config
files into a new directory, which then leaves a problem locating the
library files whose paths are only known relative to $0 (or
${BASH_SOURCE[0]}).

1. I therefore propose that where a relative path appears in
BASH_SOURCE_PATH, it should be taken as relative to the directory
containing $0 (after resolving symlinks), rather than relative to $PWD.

As an interim step until that's implemented, please ignore any relative
entries in BASH_SOURCE_PATH, so that users who really want the cwd in
BASH_SOURCE_PATH get used to writing $PWD or /proc/self/cwd instead.

2. Search BASH_SOURCE_PATH when any relative path is given, not just a path
that lacks a '/', so that libraries can be organized into subdirectories.

3. To avoid accidentally loading a script rather than a library, while
searching BASH_SOURCE_PATH, ignore any files that have exec permission,
inverting the check normally made for executables in PATH. This would keep
executables and libraries functionally separate, even if they're commingled
in the same directories.

Yes I know that some folk think it's occasionally useful to have a single
file that operates as both, but (a) new features should default to the
"safest" mode of operation, (b) this isn't python and so that's pretty rare
in practice, and (c) there was at least two work-arounds should that
behaviour actually be desired: (i) use an absolute path, or (ii) use PATH
instead of BASH_SOURCE_PATH.

4. When using "source -i", if BASH_SOURCE_PATH is unset or empty, it's
taken as equivalent to '.', so that it's useful to write "source -i
../lib/foo.bash" in a script at "$X/bin/bar" to load "$X/lib/foo.bash".

-Martin

PS: in the longer term, perhaps PATH could have similar behaviour, but
gated by a shopt or compat check.


Re: Re: Re: [PATCH 0/4] Add import builtin

2024-05-07 Thread Martin D Kealey
On Sun, 5 May 2024 at 11:50, Koichi Murase  wrote:

> > Ideally, they'll be using bash's native import under the hood!
>
> Yes, module managers still need to implement their own "import"
> command while using the proposed "import" primitive under the hood,
> and it's simply interchangeable with the source builtin we already
> have.
>
> * Module managers typically try to identify the file under more detailed
> rules [...] For these reasons, most module managers actually resolve the
> path by itself using their rules and specify *the absolute path* to the
> source builtin. As far as the absolute path is specified, there is no
> difference between the source builtin and the suggested `source -i' or
> `import'.
>

I wonder if it would be useful to add options to 'command':
* '-o' would report only the first command found (when more than one is
given) (or could be '-1');
* '-p' would skip builtins &  functions, and fail silently if no file can
be found;
* '-x' would search for files that *lack* exec permission.

Then a module loader could simply be:

require() {
  [[ ${__loaded_from[$1]} ]] && return
  local rp
  rp=$(
PATH=$LIB_PATH \
command -opvx "$1".bash "$1".sh "$1"
  ) &&
  source "$rp" &&
  __loaded_from[$1]=$rp
}

(Being a Perl monk, I like the distinction between 'require' that has
effect during run time, and 'use' that has effect during parsing. And
'import' is the process for binding names into namespaces, not the process
for loading files.)

-Martin


Readdelim (was Re: Examples of concurrent coproc usage)

2024-04-28 Thread Martin D Kealey
On Sun, 28 Apr 2024, 05:03 Carl Edquist,  wrote:

>
> > I would hope that mapfile/readarray could do better, since it's not
> > obligated to leave anything in the input stream.
>
> That is an interesting thought, although mapfile seems to read a byte at a
> time also.
>
> [I'm not suggesting this should be changed though.  And frankly I have no
> strong desire for a faster 'read' builtin in the shell, either.  A byte at
> a time is relatively slow, but that's generally fine in most limited
> contexts where it's actually needed.]
>

I'm encountering contexts where the lack of speed is annoying though not
critical.

My thought is that (performance considerations aside), the real functional
> improvement with a new "readd" call would be with _competing_ readers
> (more than one read call waiting on the same pipe at the same time).
>

That's a very good point; I'll mention that when I write to the Linux
kernel team.

In that case a length-prefixed or type-tagged record wouldn't seem to work
> with the regular read(2), because a single reader would not be able to
> read the length/type _and_ the corresponding record together.  You can't
> work around this by reading a byte at a time either.  That's why I said it
> would only seem to work (with read(2)) if the records have a fixed size.
> (In order to grab a whole record atomically.)
>

This makes me wonder whether readv() could be encouraged to handle
length-prefix simply by targeting the length element of a subsequent
element of the iovec array? iov[0].ptr = &iov[1].len; iov[0].len = sizeof
iov[1].len; iov[1].ptr = malloc(MAX_RECORD_SIZE); I suspect this won't work
today because the kernel probably computes the total read size up front.
But maybe some day?

-Martin


Re: [Help-bash] difference of $? and ${PIPESTATUS[0]}

2024-04-22 Thread Martin D Kealey
On Mon, 22 Apr 2024, 18:13 felix,  wrote:

> Hi,
>
> Coming on this very old thread:
>
> [the] man page say[s]:
>
> PIPESTATUS
>  An  array  variable (see Arrays below) containing a list of exit
>  status values from the processes in  the  most-recently-executed
>  foreground pipeline (which may contain only a single command).
>
>  ?   Expands  to  the exit status of the most recently executed fore‐
>  ground pipeline.
>
> If so, "$?" have to be equivalent to "${PIPESTATUS[0]}", I think.
>
> I suggest that man page should be modified to replace "foreground pipeline"
> by "command" under "?" paragraph.


Ironically the description of ? is correct, subject to understanding shopt
-u lastpipe, but the description of LASTPIPE is incongruent with the
meanings of "pipeline", and that the status of a compound command is the
status of its last inner command not counting any command whose status is
checked by the compound command itself (so commands immediately followed by
";do", ";then", or ";else" do not contribute to the status of the compound
command).

On the other hand, LASTPIPE is set after a simple command, ignoring any '!'
or 'time' prefix, or any (explicit or implicit) subshell because a
subshell's exit status is reported via exit+wait in the same manner as a
simple command).

That in turn implies that it will be set after any non-trivial pipeline,
because that forces each of its parts to be executed as a subshell.

-Martin


Re: Examples of concurrent coproc usage?

2024-04-22 Thread Martin D Kealey
On Mon, 22 Apr 2024, 09:17 Carl Edquist,  wrote:

> When I say "token" I just mean a record with whatever delimiter you're
> referring to using.


Ok that makes sense.

 Assuming the reading stops after consuming the first delimiter (which is
> necessary for the 'read' builtin), then you end up with one system call per
> line or record or token or whatever you want to call it.
>

That's what I was initially thinking of, but now I wonder whether the new
kernel call should also accept a record count.

[…] I was saying the shell is crippled when limited to builtins; eg, a
> read/printf loop compared to simply running cat.
>

I would hope that mapfile/readarray could do better, since it's not
obligated to leave anything in the input stream.

But yeah currently a pipe with a series of records and multiple
> cooperating/competing readers perhaps only works if the records have a
> fixed size. A new readd[elim] system call like you're talking about would
> allow safely reading a single variable-length record at a time.
>

There are other options, such as length-prefixed records, or tagged (typed)
records, but of course those aren't POSIX text files.

>
This starts to make me wonder whether mediated stdin could be more
efficient?

-Martin


Re: Examples of concurrent coproc usage?

2024-04-21 Thread Martin D Kealey
On Sun, 21 Apr 2024, 10:13 Carl Edquist,  wrote:

> On Thu, 18 Apr 2024, Martin D Kealey wrote:
> > Has anyone tried asking any of the kernel teams (Linux, BSD, or other)
> > to add a new system call such as readln() or readd()?
>
> You mean, specifically in order to implement a slightly-more-efficient
> 'read' builtin in the shell?
>

The read built-in in the shell is only one case that would benefit from
such a syscall.

The purpose would be to allow multiple processes to read in turn from a
consumable (or otherwise non seekable) input stream. In this context doing
a large block read() is exactly what we DON'T want to do, so we also can't
use a library function such as getline() that is built on top of such a
read().

By way of example, another use would be the "head" utility, which by using
such a syscall could consume only the bytes it outputs, leaving all other
bytes still in the input stream. This would be an improvement over the
current situation.

Basically any time you have cooperating processes reading delimited input,
this would be an improvement.

> I envisage this working like stty cooked mode works on a tty,
>
…

> One downside is you'd end up with a system call for each token


That's not how stty cooked mode normally works.

The typical use case is line-at-a-time, so this would reduce the number of
system calls by about 90% on a typical text input stream, more if there are
few or no blank lines.

However I would not hard code "newline" into the kernel, but rather allow
the user code to nominate a list of delimiters.

-Martin


Re: Examples of concurrent coproc usage?

2024-04-21 Thread Martin D Kealey
On Sat, 20 Apr 2024 at 01:14, Chet Ramey  wrote:

> On 4/17/24 8:55 PM, Martin D Kealey wrote:
> > Has anyone tried asking any of the kernel teams (Linux, BSD, or other) to
> > add a new system call such as readln() or readd()?
>
> They'd probably point you to an optimized version of getdelim/getline.
> You're just pushing the work down a layer.
>

Yes, that is exactly my point: move the work to where it can be done most
efficiently and most reliably.

This efficiency gain isn't some minor quibble; we're talking about a
multi-fold increase in performance, perhaps an order of magnitude speed-up
reading some text files.

(I've done some testing that indicates about a 90% reduction in time spent
in kernel calls, but depending on what else is done this might translate to
only a 3-fold improvement in practice. Or not be noticeable at all, if
you're running a slow command for each line.)

-Martin


Re: Examples of concurrent coproc usage?

2024-04-17 Thread Martin D Kealey


On Wed, 17 Apr 2024, Chet Ramey wrote:
> On 4/16/24 2:46 AM, Carl Edquist wrote:
>
> > But the shell is pretty slow when you ask it to shovel data around like
> > this.  The 'read' builtin, for instance, cautiously does read(2) calls of a
> > single byte at a time.
>
> It has to do it that way to find the delimiter on a non-seekable file
> descriptor, since it has to leave everything it didn't consume available
> on stdin.

Has anyone tried asking any of the kernel teams (Linux, BSD, or other) to
add a new system call such as readln() or readd()?

I envisage this working like stty cooked mode works on a tty, except it
would also work on files, pipes, and sockets: you'd get back *at most* as
many bytes as you ask for, but you may get fewer if a delimiter is found.
The delimiter is consumed (and returned in the buffer), but everything
following a delimiter is left available for a subsequent read.

For a tty, code like

 tc_setattr(fd, &(struct termios){ .c_iflags = ICANON, .c_cc = { [EOL] = '\n' 
}});
 ssize_t n = read(fd, &buf, sizeof buf);

could become just

 ssize_t n = readd(fd, &buf, sizeof buf, "\n", 1, 0 /*flags*/);

or perhaps even

 ssize_t n = readdv(fd,
&buf, sizeof buf,
(struct iovec[]){{"\n", 1}}, 1,
0 /*flags*/);

I'm not sure whether multi-byte delimiters should be allowed, as it's
unclear what to do when you get an incomplete delimiter at the end of the
buffer, but an iovec interface would at least allow that as a future
possibility.

As Linux kernel developers have found, it's better to *always* include a
flags argument, even if you can't think of a use for it yet; but in this
case O_NONBLOCK and O_PEEK could immediately be useful.

-Martin

PS: I initially wondered about having

 ssize_t n = readvdv(fd,
 (struct iovec[]){{&buf, sizeof buf}}, 1,
 (struct iovec[]){{"\n", 1}}, 1);

but a vectored read isn't much of a saving unless the overall size is very
large, and enormous reads probably shouldn't use this facility since the
in-kernel byte scan would potentially block reads and writes by other
processes.

For the same reason I would allow the read size to be silently capped at a
value chosen by the kernel, probably a small multiple (1?) of the IO block
size or the pipe buffer size or the tty input buffer size.



Re: [PATCH v2 04/18] doc/bash.1: improve typography of ellipses

2024-04-11 Thread Martin D Kealey
On Thu, 1 Feb 2024 at 07:54, G. Branden Robinson <
g.branden.robin...@gmail.com> wrote:

> v2: Prevent confclit with PATCH v2 01/18.
> Apply ellipsis advice from groff_man_style(7).
> • The dummy character escape sequence \& follows the ellipsis when further
> text will follow after space on the output line, keeping its last period
> from being interpreted as the end of a sentence and causing additional
> inter‐sentence space to be placed after it.
>

Is there a reason why we're still using a triple period/full-stop “...”
(\u002e) instead of an actual ellipsis “…” (\u2026)?

-Martin


Re: Parsing regression with for loop in case statement

2024-04-10 Thread Martin D Kealey
I can confirm that this changed between 4.4.23(49)-release and
5.0.0(1)-beta, which coincides with the parser being largely rewritten.

On Thu, 11 Apr 2024 at 12:51,  wrote:

> The POSIX shell grammar specifies that a newline may optionally appear
> before the in keyword of a for loop.


I don't see that at §2.9.4 "The for Loop" (
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_04_03)
and I've never seen it in the wild.

But ... oh look, it's mentioned in §2.10.2 (
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_10_02
).

I wonder when that was added, and why?

-Martin


Re: Potential Bash Script Vulnerability

2024-04-08 Thread Martin D Kealey
Hmm, looks like I'm partially mistaken.

Vim never does the inode pivot trick *in circumstances where I might've
noticed*, so not when the file:
- has multiple links, or
- is a symlink, or
- is in an unwritable directory, or
- otherwise appears to be something other than a plain file.

But it turns out it does pivot the inode when it thinks it won't be
noticed, which makes sense because it's less risky than overwriting a file
(which could result in data loss if the write fails).

So I've learned something new, thankyou.

-Martin

On Tue, 9 Apr 2024 at 11:13, Kerin Millar  wrote:

> On Tue, 9 Apr 2024 10:42:58 +1200
> Martin D Kealey  wrote:
>
> > On Mon, 8 Apr 2024 at 01:49, Kerin Millar  wrote:
> >
> > > the method by which vim amends files is similar to that of sed -i.
> > >
> >
> > I was about to write "nonsense, vim **never** does that for me", but
> then I
> > remembered that using ":w!" instead of ":w" (or ":wq!" instead of ":wq")
> > will write the file as normal, but if that fails, it will attempt to
> remove
> > it and create a new one. Ironically, that's precisely one of the cases
> > where using "sed -i" is a bad idea, but at least with vim you've already
> > tried ":w" and noticed that it failed, and made a considered decision to
> > use ":w!" instead.
> >
> > Except that nowadays many folk always type ":wq!" to exit vim, and never
> > put any thought into this undesirable side effect.
> >
> > I put that in the same bucket as using "kill -9" to terminate daemons, or
> > liberally using "-f" or "--force" in lots of other places. Those  are bad
> > habits, since they override useful safety checks, and I recommend making
> a
> > strenuous effort to unlearn such patterns. Then you can use these
> stronger
> > versions only when (1) the soft versions fail, and (2) you understand the
> > collateral damage, and (3) you've thought about it and decided that it's
> > acceptable in the particular circumstances.
> >
> > -Martin
> >
> > PS: I've never understood the preference for ":wq" over "ZZ" (or ":x"); I
> > want to leave the modification time unchanged if I don't edit the file.
>
> Alright. In that case, I don't know why I wasn't able to 'inject' a
> replacement command with it. I'll give it another try and see whether I can
> determine what happened.
>
> --
> Kerin Millar
>


Re: Potential Bash Script Vulnerability

2024-04-08 Thread Martin D Kealey
On Mon, 8 Apr 2024 at 01:49, Kerin Millar  wrote:

> the method by which vim amends files is similar to that of sed -i.
>

I was about to write "nonsense, vim **never** does that for me", but then I
remembered that using ":w!" instead of ":w" (or ":wq!" instead of ":wq")
will write the file as normal, but if that fails, it will attempt to remove
it and create a new one. Ironically, that's precisely one of the cases
where using "sed -i" is a bad idea, but at least with vim you've already
tried ":w" and noticed that it failed, and made a considered decision to
use ":w!" instead.

Except that nowadays many folk always type ":wq!" to exit vim, and never
put any thought into this undesirable side effect.

I put that in the same bucket as using "kill -9" to terminate daemons, or
liberally using "-f" or "--force" in lots of other places. Those  are bad
habits, since they override useful safety checks, and I recommend making a
strenuous effort to unlearn such patterns. Then you can use these stronger
versions only when (1) the soft versions fail, and (2) you understand the
collateral damage, and (3) you've thought about it and decided that it's
acceptable in the particular circumstances.

-Martin

PS: I've never understood the preference for ":wq" over "ZZ" (or ":x"); I
want to leave the modification time unchanged if I don't edit the file.


Re: Examples of concurrent coproc usage?

2024-04-04 Thread Martin D Kealey
I'm somewhat uneasy about having coprocs inaccessible to each other.
I can foresee reasonable cases where I'd want a coproc to utilize one or
more other coprocs.

In particular, I can see cases where a coproc is written to by one process,
and read from by another.

Can we at least have the auto-close behaviour be made optional, so that it
can be turned off when we want to do something more sophisticated?

-Martin


Re: Manual: clarify what POSIX stands for

2024-03-31 Thread Martin D Kealey
On Thu, 25 Jan 2024, 20:04 Alan Urmancheev,  wrote:

> Currently, Bash's manual definitions section mentions POSIX, but doesn't
> explain what that abbreviature stands for

 ...

> I think that abbreviatures can be confusing, especially when you don't get
> to know what they stand for.
>

I suspect this confusion arises from a pattern that's common in some other
languages but not in English. In English a name generally does not "mean"
anything (*1); and most native speakers generally feel no compelling desire
to dissect a name to figure out its "meaning". (Heck, we don't even dissect
idiomatic phrases into their separate words, leading to English being
mildly agglutinative. (e.g. "hairdo", "login", "setup", "today".))

The phrase "Portable Operating System Interface" is *less* meaningful to
most English speakers, and in practice is only used to answer the question
"what does POSIX stand for". (That's why the Wikipedia title «Portable
Operating System Interface
»
redirects to "POSIX" and not the other way around.)

Therefore, I propose to add the meaning of the abbreviature to the manual.


That's going to be tricky, since like most English names, it does not *mean*
anything. Rather, it has a referent, which is ISO/IEC 9945.

The name "POSIX" was adopted largely because it was more memorable and
easier to pronounce than alternatives that were suggested at the time, and
forty year later that's the name it's universally known by. The history is
a bit unclear on this point, but it seems likely that POSIX was coined
first, and then the retronym "Portable Operating System Interface" was
coined to match it a few minutes later.

Most English speakers find "explanations of names" to be distractions
rather than helping, so if you REALLY want to add this, can it please NOT
be right next to the first use of the word "POSIX". For example as an
end-note. (If this were MarkDown or HyperText, I'd say "put a link and
nothing else", but unfortunately man pages are written in ROFF, so links
aren't easily accessible.)

(«portable operating system interface X», or something like that).


Close, but no; the «X» does not abbreviate anything; it's there because in
the 1980's it was customary for Unix-like operating systems to have
block-capital names ending with «IX». Maybe *that* should go in the
explanation.

I suggest an explanation more along the lines of «POSIX is a suite of
standards endorsed jointly by the International Standards Organisation (
ISO.org) and the International Electro-technical Commission (IEC.ch) as
ISO/IEC 9945. The current revision is POSIX-1-2017, based on ISO/IEC
9945-2008 with technical corrigenda. Further information is available at
https://en.wikipedia.org/wiki/POSIX».

-Martin (Martin's Adroit Recursive Turing Implementing Noggin) (*3)

PS:
(*1) I dare you to ask what "GNU" stands for. Or "UNIX".

(*2) no, we don't insert "ISO is short for blah blah", because that is
*also* not part of the name of the standard.

(*3)
https://books.google.com/ngrams/graph?content=Posix%2CPOSIX%2CPortable+Operating+System%2CPortable+Operating+System+Interface&year_start=1960&year_end=2019&corpus=en-2019&smoothing=3

>


Bug tracking

2024-03-31 Thread Martin D Kealey
On Mon, 11 Dec 2023, 05:19 Chet Ramey,  wrote:

> On 11/30/23 5:18 AM, Martin D Kealey wrote:
>
> > If there's a bug tracking system beyond "threads in a mailing list", I'd
> > like to know how I can get access to it.
>
> https://savannah.gnu.org/support/?group=bash


That's a good start, but it seems incomplete, and there's little -- perhaps
no -- overlap with bug reports in this list.

Has bashbug always sent email to bug-bash@gnu.org, or was it previously fed
into Savannah?

Savannah seems too simplistic, as it ignores what happens after code is
written, and fails to distinguish other important steps. (Among other
things: it lacks a field for "intended release" and lacks numerous status
options including "awaiting design review", "awaiting code review", "ready
for release", & "release scheduled"; there's no link between bug reports
and the source repo (so management of pull requests); and it doesn't have
any form of continuous testing or integration.) And of course, it's
disconnected from this mailing list.

Perhaps bug tracking could be migrated to a more modern system? I know that
(for good reason) GNU projects won't use proprietary services like
Bitbucket or Github, but perhaps Bugtraq or Gitlab would be acceptable, or
maybe some other project management tool?

Aside from the shortcomings of Savannah itself, I'm concerned that bug
reports in Savannah can only be assigned to three people, two of whom have
been inactive for years. (I'm guessing they're the same people who can
update bugs.) Are we building a cathedral with gatekeepers, or a bazaar
where the masses can contribute?

-Martin

PS: bashbug announces «Bashbug is used to send mail to the Bash
maintainers», and then addresses the message to *this* mailing list. I read
that as "this list is only for maintainers". (And yes, I distinguish
between subscribers, who receive messages from the list, and submitters,
who send messages to the list.)

Somewhere along the line this list has ceased to fulfil that role, instead
becoming an informal bash testers list, while the actual
bash-test...@cwru.edu list doesn't rate a mention here, and moreover
attempting to subscribe by sending to bash-testers-requ...@cwru.edu bounces
(550 5.1.1 User unknown). If the testers list is elsewhere, bashbug needs
updating.


Re: Debian bug #929178: wrong trap displayed inside functions

2024-03-26 Thread Martin D Kealey
On Tue, 26 Mar 2024 at 04:05, Oğuz  wrote:

> On Mon, Mar 25, 2024 at 8:38 PM G. Branden Robinson
>  wrote:
> > [1]
> > [1] http...
>
> I keep seeing this. Why don't you guys just paste the link?
>

When forwarding incoming HTML to a text-only list, most mailing list
servers will put the hyperlinks in a footnote, so that that long links
won't obscure the text they apply to. (The better ones only do this when
text text and its hyperlink differ; if the text and its hyperlink match
(like when you have XX for any XX) then nothing is gained
by duplicating it. That would appear to be the case with this mailing list.)

Some mail receivers (e.g. Gmail) will convert bare text that looks like a
hyperlink back into a hyperlink, which is how you get clickable links in
the footnote.

-Martin


Re: Add option to just print history, with no added timestamps or line numbers

2024-03-24 Thread Martin D Kealey
Hi Dan

How about « fc -ln » ?

It might be helpful to have explicit cross-references between the help
displays for «history» and «fc».

-Martin

On Sun, 24 Mar 2024 at 15:40, Dan Jacobson  wrote:

> $ help history
> should mention how in the world one is supposed to just print the plain
> history,
> without any line numbers or time stamps.
>
> You might say, "Just strip it off with perl or sed." Well, fine. Then
> mention that in help history.
>
> Currently one needs massive superfund environmental clean-up effort
> workarounds, e.g.,
> $ HISTTIMEFORMAT=' ' history | perl -pwle 's/^\s+\S+\s+//'
>
> Better yet, add a
> history -j: Just print history, with no added timestamps or line numbers,
> etc.
>
> Or OK, there's HISTTIMEFORMAT. How about also a HISTFORMAT (default " %5n
> %t %h\n" or whatever,) so one could use "%h\n" for "just give me the
> history item."
>
>


Re: multi-threaded compiling

2024-03-12 Thread Martin D Kealey
> On Mon, Mar 11, 2024 at 8:20 PM Chet Ramey  wrote:
> > On 3/11/24 2:50 PM, Mischa Baars wrote:
> > > Which sort of brings us back to the original question I suppose. Who
> does
> > > that line of code function from a script and why does it fail from the
> > > command line?
> >
> > Job control and when the shell notifies the user about job completion,
> > most likely, two of the relevant things that differ between interactive
> > and non-interactive shells.
>

In this case, no.

I inserted « echo $- ; shopt -p ; shopt -po ; » in front of each case, and
the ONLY difference was that « echo $- » reported “hxBc” vs “hxB”. Not an
“m” in sight. And no “i” or “l” either. (The “c” was expected, given how «
make » invokes the shell.)

-Martin


Re: multi-threaded compiling

2024-03-12 Thread Martin D Kealey
In section one, the problem is that "wait -n" does not do what you think it
does. (Lots of us think this behaviour is broken, and it may be fixed in an
upcoming version of Bash.) You don't need '-n' when you specify a PID; the
fix is simply to remove it.

In section two, the problem is that quote removal is done BEFORE variables
are expanded, even though it prevents word splitting from being done AFTER
variable expansion. Therefore writing VAR=" \"string 1\" \"string 2\" "
absolutely cannot do what you might expect; the embedded quote marks will
be used literally, and then (because ${CFLAGS[0]} is not quoted) the
resulting string will be split on any embedded whitespace..


On Mon, 11 Mar 2024 at 18:56, Mischa Baars 
wrote:

> Hi,
>
> I'd like to compile a project of mine using multiple logical cores.
>
> I've attached the problem. It consists of two parts:
>
> 1) multi-threaded bash script and / or multi-threaded Makefile
>
> Running bash script functions as expected, but executing the same line of
> code with make and / or the command line, does not function. Perhaps
> someone could explain to me why?
>
> 2) passing a string argument from a bash script and / or Makefile to the
> gcc -D option
>
> Running the makefile functions as expected, but I have not been able to get
> similar code to work from a bash script. Can someone please explain to me
> what I'm doing wrong?
>
> Hope to hear from you soon.
>
> Best regards,
> Mischa Baars.
>


Re: human-friendly ulimit values?

2024-02-28 Thread Martin D Kealey
Personally I don't have any problem with 800 kB == 8 GB or 104857600
KiB == 100 GiB, but it's not as if having nice round power-of-two numbers
really matters in *this* case, where 107500 KiB is close enough to 1
TiB. But I guess not everyone is as comfortable with mental arithmetic.

On Thu, 29 Feb 2024 at 13:59, Dale R. Worley  wrote:

> What position does it take on the "GB" vs. "GiB" business?
>

For that matter, what position does it take on megabytes (MB) vs millibits
(mb)?

Should fractions be allowed? If so, how should they be rounded?
When should values be displayed as MiB or GiB or TiB? With fractions, or
rounded?
Should octal or hexadecimal be allowed (since they're easier to express
powers of two)?

-Martin


Re: declare -f does not output esac pattern correctly

2024-02-28 Thread Martin D Kealey
On Tue, 27 Feb 2024 at 18:48, Oğuz  wrote:

> On Tuesday, February 27, 2024, Martin D Kealey 
> wrote:
>
>> I've been thinking for a while now that POSIX made a mistake when it
>> permitted ';;' before the closing 'esac'.
>>
>
> I think that decision was made before POSIX. Besides it's handy when
> generating case clauses on the fly, you may not always know which case is
> going to be the last. You may not always know that at least one clause is
> going to be generated either, but `case x in esac' is valid, so it's not a
> problem. The syntax for the case command is neat as-is.
>

Oh I'm well aware that it's "easier for humans", to have a consistent
terminator, but it's a horrible wart from a parsing point of view. There's
exactly one word that cannot be an unquoted pattern, and that's crazy.

Another approach could have been to have an initiator keyword or symbol,
rather than a terminator.
If you squint closely, 'in' and ';;' have pretty much the same purpose, so
if we were simply replace 'in' with ';;' we would get:

case $thing
  ;; a) echo A
  ;; b) echo B
  ;; c) echo C
esac

Which is nice and regular, just like you're asking for.

And arguably it makes more sense for the flow-through symbol to be as
obvious as possible:

case $thing
  ;;  *a*) echo has A
  ;;& *b*) echo has B
  ;&  *c*) echo has B or C
  ;;  *d*e* | *e*d* ) echo has D and E
esac

-Martin

PS: yes that's even more ugly than the current syntax, but it *is* more
regular. And no, I don't think we can convince POSIX to allow it.


Re: declare -f does not output esac pattern correctly

2024-02-27 Thread Martin D Kealey


I've been thinking for a while now that POSIX made a mistake when it
permitted ';;' before the closing 'esac'. If ';;' were prohibited there,
then the parser could be sure that the next word after every ';;' would be a
pattern, even if it looks like 'esac'. But as things stand, there's an
ambiguity which has traditionally been resolved by assuming an unquoted
'esac' occurring in the 'expect-a-pattern' state is actually a
case-statement terminator. (And don't get me started on the stupidity of
intentionally mismatched parentheses.)

But that's all water under the bridge.

Prior to version 5.2 of Bash, even inserting '(' before esac wasn't enough
to hide it:

$ bash-5.1.12p1-release -c 'a () { case $1 in (esac) echo esac ; esac } ; type 
a'
bash-5.1.12p1-release: -c: line 1: syntax error near unexpected token `esac'
bash-5.1.12p1-release: -c: line 1: `a () { case $1 in (esac) echo esac ; esac } 
; type a'
$ bash-5.2.0p1-alpha -c 'a () { case $1 in (esac) echo esac ; esac } ; type a'
a is a function
a ()
{
case $1 in
esac)
echo esac
;;
esac
}

A better approach might be simply to quote 'esac' as a pattern words in the
output of declare or type. Herewith a patch that fixes both annoyances:

$ build/bash
$ ./bash
$ a() { case $1 in "") echo None ;; (esac) echo Esac ; esac }
$ shopt -p balanced_case_parens
shopt -u balanced_case_parens
$ type a
a is a function
a ()
{
case $1 in
"")
echo None
;;
\esac)
echo Esac
esac
}
$ shopt -s balanced_case_parens
$ type a
a is a function
a ()
{
case $1 in
("")
echo None
;;
(esac)
echo Esac
esac
}
$ git d devel..@
diff --git a/builtins/shopt.def b/builtins/shopt.def
index b3e1cfe5..0a385a58 100644
--- a/builtins/shopt.def
+++ b/builtins/shopt.def
@@ -75,6 +75,7 @@ $END
 #define OPTFMT "%-15s\t%s\n"

 extern int allow_null_glob_expansion, fail_glob_expansion, glob_dot_filenames;
+extern int balanced_case_parens;
 extern int cdable_vars, mail_warning, source_uses_path;
 extern int no_exit_on_failed_exec, print_shift_error;
 extern int check_hashed_filenames, promptvars;
@@ -182,6 +183,7 @@ static struct {
   { "array_expand_once", &expand_once_flag, set_array_expand },
   { "assoc_expand_once", &expand_once_flag, set_array_expand },
 #endif
+  { "balanced_case_parens", &balanced_case_parens, (shopt_set_func_t *)NULL },
   { "cdable_vars", &cdable_vars, (shopt_set_func_t *)NULL },
   { "cdspell", &cdspelling, (shopt_set_func_t *)NULL },
   { "checkhash", &check_hashed_filenames, (shopt_set_func_t *)NULL },
diff --git a/print_cmd.c b/print_cmd.c
index 330223d3..892443ab 100644
--- a/print_cmd.c
+++ b/print_cmd.c
@@ -52,6 +52,8 @@ extern int printf (const char *, ...);/* Yuck.  
Double yuck. */
 static int indentation;
 static int indentation_amount = 4;

+int balanced_case_parens = 0;
+
 typedef void PFUNC (const char *, ...);

 static void cprintf (const char *, ...)  __attribute__((__format__ (printf, 1, 
2)));
@@ -771,6 +773,11 @@ print_case_clauses (PATTERN_LIST *clauses)
   if (printing_comsub == 0 || first == 0)
newline ("");
   first = 0;
+  if (balanced_case_parens)
+cprintf("(");
+  else if (!strcmp(clauses->patterns->word->word, "esac"))
+cprintf("\\");
+
   command_print_word_list (clauses->patterns, " | ");
   cprintf (")\n");
   indentation += indentation_amount;
@@ -781,7 +788,7 @@ print_case_clauses (PATTERN_LIST *clauses)
newline (";&");
   else if (clauses->flags & CASEPAT_TESTNEXT)
newline (";;&");
-  else
+  else if (clauses->next) /* be unambiguous: omit last ';;' */
newline (";;");
   clauses = clauses->next;
 }

-Martin



Re: Bug: Ligatures are removed as one character

2024-02-25 Thread Martin D Kealey
n Fri, 23 Feb 2024, Chet Ramey wrote:
> On 2/19/24 9:26 PM, Avid Seeker wrote:
> > When pressing backspace on Arabic ligatures (including characters with
> > diacritics), they are removed as if they are one character.
>
> As you might guess, readline doesn't know much about Arabic, per se. In a
> UTF-8 locale, for example, it knows base characters and combining
> characters.
>
> The idea is simple: when moving backwards, move one multibyte character at
> a time, ignoring combining characters, until you get to a character for
> which wcwidth(x) > 0, and move point there. The algorithm for moving
> forward is similar.
>
> How should this be modified to support Arabic in a portable way?

Unicode has categories for "modifiers" (especially "modifier letters") and
for "combining characters". Note that each symbol can be in multiple
categories.

Modifiers change how another character is displayed. They may or may not be
considered to have their own separate semantic meaning. In the simple cases
they simply over-print an additional mark, but more complex adjustments are
possible. They don't normally change the overall size of the modified
character, so wcwidth(ch) will report zero.

What matters is that "combining characters" do not have stand-alone semantic
meaning; they should be erased along with the principal character. Accents
in European languages (and Thai) tend to be in this category.

To a first approximation, backspace should skip over the latter but not the
former. However if you've just removed a zero-width element, it would be
advisable to either re-render the whole line, or backspace over the last
full glyph, erase it, and re-render it with all its (remaining) modifiers.

https://stackoverflow.com/questions/54450823/what-is-the-difference-between-combining-characters-and-modifier-letters

On systems that need to dynamically load a shared library (linunicode.so?)
to support this, I suggest delaying doing so until it's needed -- after
setlocale("something.UTF-8") returns success, or some equivalent test. (I
hope there's a check that can be done against the already-loaded locale,
rather than inspecting the locale name as a string.)

-Martin



Re: It is possible to remove the readonly attribute from {BASH, SHELL}OPTS

2024-02-22 Thread Martin D Kealey
On Wed, 21 Feb 2024 at 08:09, Chet Ramey  wrote:

> On 2/20/24 4:11 AM, Martin D Kealey wrote:
> > Ideally each function invocation would have its own variable namespace,
> > only using the global namespace as a fall-back, but that creates
> > complications with exported variables, so let's take baby steps to get
> there.
>
> That doesn't work with dynamic scoping at all.
>

Yes, that's exactly the point, to *avoid* dynamic scoping. I want the
equivalent of Perl's "my", rather than Perl's "local".

Perhaps I should clarify that I'm using "global namespace" to mean what
Perl calls a "package" (except Bash has only one), and that it *includes*
the dynamic effects of dynamic "local" statements. Therefore "using the
global namespace as a fall-back" means precisely that variables that are
not explicitly lexically scoped would continue to behave as they do now.

Code that currently relies on dynamic scoping would continue to work, while
new code can avoid the craziness that comes from "everything is global,
even when we claim it's local" and "unset can even mess with the poor
protection afforded by 'local'".

-Martin

PS:
Lexical variable scope implies that nested functions require multiple
'local' symbol tables to be active concurrently, effectively as in a chain.
That in turn means the local variables associated with a function
invocation would need to outlive that invocation if there are lexically
nested functions that access them.

If lexical variable declarations were declarative (immediately effective
during parsing) rather than procedural (only becoming effective when the
flow of control passes over them), that would be advantageous: a single
local symbol table could be kept with the parse tree for each function,
with an index into a per-invocation storage array. Then some time in the
future, symbol look-ups could be done during the parsing phase, leaving a
'local reference by index' entry in the parse tree. (Having a fixed symbol
table implies that the behaviour of the 'unset' command would subtly
change; for the global symbol table it would continue to behave as now, but
for the symbol tables associated with functions, it would place an 'unset'
marker in the slot, rather than deleting the name. There may be corner
cases where that's a detectable change, but since it's opt-in I think
that's acceptable. And that means some new mechanism to implement 'upvar'
would be needed.)

Being effective during parsing would argue for 'local' being a keyword,
albeit one that mostly behaves as if it's a command, but I can see that
some would argue that it's too much change, so I wouldn't object to leaving
'local' as it is and defining a new keyword for this purpose; my preference
would be one of 'my', 'var', or 'use local' (where 'use' is a general
'during parsing' keyword).


Re: [PATCH] retry opening startup files on EINTR

2024-02-20 Thread Martin D Kealey
On Wed, 21 Feb 2024 at 02:37, Grisha Levit  wrote:

> sigaction(2) says:
>
> The affected system calls include open(2), read(2), write(2),
> sendto(2), recvfrom(2), sendmsg(2) and recvmsg(2) on a communications
> channel or a slow device (such as a terminal, but not a regular file)
>
> so I guess a SIGWINCH during the open(2) for ~/.bash_profile, etc. can
> still get interrupted.
>

In most cases no, but "regular" files within mounted filesystems that
themselves have a "slow" communication channel may be deemed "slow" or
"fast" depending on the FS type and the mount options. This is especially
true for NFS (on any platform) and FUSE (on Linux); I suspect it affects
CIFS but I'd have to check.

-Martin


Re: Bug: Ligatures are removed as one character

2024-02-20 Thread Martin D Kealey
It's been a long time since I looked into Unicode, but this is what I
remember.

Depending on the Unicode normalisation level, backspace is *supposed* to
remove a letter and all its associated combining marks.

The root problem seems to be that some Arabic letters change from
"non-combining" to "combining" depending on the language in which they're
used. Unicode also has a problem distinguishing a combining letter (vowel
points in Arabic or Hebrew) from a combining diacritic (accents in Latin
script).

If you think that's a bug in Unicode, you're not alone; the Unicode
consortium has been struggling with this for at least ten years - see
https://unicode.org/L2/L2014/14109-inline-chars.pdf

There's been some progress; Unicode version 12 has at least admitted
there's a problem (https://www.unicode.org/versions/Unicode12.1.0/ch07.pdf
chapter 7.9 page 327).

I'll leave it to others to survey the current state of play with Unicode,
but historically it's been a mess.

-Martin


On Tue, 20 Feb 2024 at 12:26, Avid Seeker 
wrote:

> When pressing backspace on Arabic ligatures (including characters with
> diacritics), they are removed as if they are one character.
>
> Example:
>
> السَّلامُ
>
> Pressing 3 backspaces leaves the word at ال. It removed لا which is a
> ligature
> combining "ل" and "ا", and removed "م" with diacritics. Compare this with
> the
> behavior of zsh.
>
> For non-Arabic speakers, this is like typing: fi (U+0046 U+0049), but when
> pressing backspace it removed it as the character: fi (U+FB01).
>
>


Re: It is possible to remove the readonly attribute from {BASH, SHELL}OPTS

2024-02-20 Thread Martin D Kealey
On Sat, 17 Feb 2024 at 02:32, Chet Ramey  wrote:

> Let's say we take the approach of restricting attribute changes on readonly
> variables to export/trace/local.
>
> Should it be an error to attempt to set other attributes (it already is
> with nameref), or should declare silently ignore it?
>

I would prefer to make "local" behave as much as possible like a true
lexically scoped declaration in "regular" languages.

Much as I hate the effect on backwards compatibility, I hate the current
situation even more: it's not possible to write a re-usable general-purpose
utility function because the function has to avoid overriding outer
variables that might differ in their attributes from what the function
needs. Arrays and read-only are particularly problematic.

So yes please, I'd like "local" to push a new variable definition that
inherits nothing from any outer one: not name-ref, not read-only, not array
(of any kind), not assignment modifiers (integer, upper-case, lower-case),
and above all, not any previous values.

Ideally each function invocation would have its own variable namespace,
only using the global namespace as a fall-back, but that creates
complications with exported variables, so let's take baby steps to get
there.

Maybe this would be twisting 'local' too much, and it warrants creating a
new keyword such as 'var'?

-Martin


Re: About `M-C-e` expand result `'` failed

2024-02-04 Thread Martin D Kealey
On Sat, 3 Feb 2024 at 07:21, Chet Ramey  wrote:

> On 2/2/24 3:36 PM, Zachary Santer wrote:
> >  Ultimately, what I'm saying is that a different bindable function that
> performs all the shell expansions other than quote removal would be more
> useful than shell-expand-line.
>
> OK, I'll take that as a feature request for a future version.
>

If I might make some related feature requests;
1. Please can the "strip quotes" functionality be its own separate bindable
function;
2. Please make the re-quoting  smart enough to handle "!" when history
expansion is enabled;
3. Please apply re-quoting after history expansion when histreedit is in
effect, so that '!' resulting from a history expansion doesn't trigger
*another* history expansion.

As I detailed earlier ...

> By all means, add a "strip-quotes" command to readline, so the user can
> use in the exceptional cases where you want to diverge from what the shell
> would have done without M-C-e, but by default any expansion or substitution
> that's triggered by a readline command should render a result that's immune
> to that same expansion being done again when the user hits enter.
>
> Which quotes to reinstate probably depends on which expansions have
> already been done. This includes history expansion when histreedit is on;
> for example, this is unhelpful:
>
> $ ! echo Hi
> Hi
> $ echo !:0!$
> [expands history and re-loads input buffer]
> $ echo !Hi
> bash_5.1.4p47-release: !Hi: event not found
>
> I suggest that the history expansion should note the result of an
> expansion includes a history expansion character ("!" by default), and
> apply a modification if it would be recognized as such (followed by
> [[:alnum:]_:?%*$-], and not in single quotes).
>
> That modification would be:
>  - outside quotes, insert \ before a recognized history character; if
> necessary mark this '\' byte to prevent it from being doubled by subsequent
> expansions
>

(it's not clear to me whether this can actually occur, but I raise the
issue so it doesn't get overlooked if it is actually necessary)


>  - inside single quotes, leave alone;
>  - inside double quotes, insert "" (two double-quote characters) after it.
>
> Similar safeguards would be needed after any other kind of expansion,
> except that it suffices to treat "!" as a
>
[generic meta-character]

> character for the purpose of deciding whether or not a word needs to be
> re-quoted. (As long as it's always single-quoted, or backslash-escaped, the
> history characters don't need any other special treatment.)
>
> One way to decide whether any history chars need protection would simply
> be to apply history expansion to the result of the requested expansion, and
> if that changes the text, something in it needs protecting. Ideally the
> history expansion logic would note the location of any history expansion
> characters, so that readline could use that as a hint of what to fix.
>


Re: About `M-C-e` expand result `'` failed

2024-02-03 Thread Martin D Kealey
On Sun, 4 Feb 2024 at 15:17, Koichi Murase  wrote:

> 2024年2月4日(日) 12:59 Martin D Kealey :
> > I am generally concerned about breaking changes that affect existing
> scripts, but I see changes to readline as less problematic,
>
> I also assume shell scripts, but shell scripts for interactive settings.
> Interactive settings can also be a large-scale shell script.


That is a possibility that I hadn't considered. Thankyou for bringing it to
my attention.

> On reflection, this would be a fair compromise, at least in the short
> term.
>
> Does it need to be short-term?  Do we need to remove the feature?
>

Not necessarily; I just hadn't thought it through enough to convince myself
that it should necessarily remain, so I didn't want to commit to the long
term.

> Might we offer guidance that distros include a new binding for M-C-e in
> their supplied /etc/skel which would only affect new users, not existing
> users?
>
> I have a native question.  Why do people on this thread discuss changing
> the behavior of "\M-\C-e" even though there is still a large space of key
> combinations?  "\M-\C-e" is already used [lots of places...]
> I don't see a reason to introduce unnecessary conflicts when we can just
> pick another key combination (e.g. "\M-\C-x") for
> `shell-expand-and-requote-line'.
>

When you put it like that, it seems entirely reasonable that M-C-e should
remain as-is.
Consider all my previous suggestions to the contrary withdrawn.

-Martin

PS: Sadly M-C-r seems to be already taken, so I can't just hop one key over.


Re: About `M-C-e` expand result `'` failed

2024-02-03 Thread Martin D Kealey
On Sun, 4 Feb 2024, 02:01 Koichi Murase,  wrote:

> I now think I should leave a comment because even Martin (who I believed
> was one of the careful people about backward compatibility as seen in
> [1,2]) seems to suggest a breaking change.
>

That's a fair point.

I am generally concerned about breaking changes that affect existing
scripts, but I see changes to readline as less problematic, since an
interactive user gets the chance to check the replacement before hitting
enter.

Yes users have to learn new behaviours, which isn't ideal, but if my
suggested "unquote" ("dequote"? "strip-quoting"?) bindable function was
also added, the impact on users would be minimal: press M-C-e and then a
second keypress to remove quotes, bringing the input buffer to the same
state as would occur under the current arrangement. (By choosing a default
binding for "unquote" that currently does nothing, people could then use
the same keyboard arpeggio on both old and new bash.)

If the requoting behavior would be desired, I strongly suggest keeping the
> existing behavior of shell-expand-line but adding a separate new
> bindable function (like shell-expand-and-requote-line) to perform the
> expansion and requoting.
>

On reflection, this would be a fair compromise, at least in the short term.

Might we offer guidance that distros include a new binding for C-M-e in
their supplied /etc/skel which would only affect new users, not existing
users?

-Martin


Re: About `M-C-e` expand result `'` failed

2024-02-03 Thread Martin D Kealey
On Wed, 31 Jan 2024 at 01:04, Andreas Schwab  wrote:

> On Jan 30 2024, Zachary Santer wrote:
> > There's no way this is the intended behavior, right?
>
> The command is doing exactly what it is documented to do, that is do all
> of the shell word expansions.
>

If that's how the documentation is interpreted, then clearly there's a bug
in the documentation, because this is almost never *useful* behaviour.

It actually says:

> shell-expand-line (M-C-e) ... This performs alias and history expansion
as well as all of the shell word expansions.

It does *not* say that it does quote removal.

It could be argued that quote removal is implicitly required for some of
the steps to proceed, but in that case it can equally be argued that such
removal must therefore be reversed afterwards.


By all means, add a "strip-quotes" command to readline, so the user can use
in the exceptional cases where you want to diverge from what the shell
would have done without M-C-e, but by default any expansion or substitution
that's triggered by a readline command should render a result that's immune
to that same expansion being done again when the user hits enter.

Which quotes to reinstate probably depends on which expansions have already
been done. This includes history expansion when histreedit is on; for
example, this is unhelpful:

$ ! echo Hi
Hi
$ echo !:0!$
[expands history and re-loads input buffer]
$ echo !Hi
bash_5.1.4p47-release: !Hi: event not found

I suggest that the history expansion should note the result of an expansion
includes a history expansion character ("!" by default), and apply a
modification if it would be recognized as such (followed by
[[:alnum:]_:?%*$-], and not in single quotes).

That modification would be:
 - outside quotes, insert \ before a recognized history character; if
necessary mark this byte to prevent it from being doubled by subsequent
expansions.
 - inside single quotes, nothing
 - inside double quotes, insert "" (two double-quote characters) after it.

Similar safeguards would be needed after any other kind of expansion,
except that it suffices to treat "!" as a separator character for the
purpose of deciding whether or not a word needs to be re-quoted. (As long
as it's always single-quoted, or backslash-escaped, the history characters
don't need any other special treatment.)

One way to decide whether any history chars need protection would simply be
to apply history expansion to the result of the requested expansion, and if
that changes the text, something in it needs protecting. Ideally the
history expansion logic would note the location of any history expansion
characters, so that readline could use that as a hint of what to fix.

-Martin


Re: ./script doesn't work in completion function

2024-01-22 Thread Martin D Kealey
Chet has since pointed out that the debugger is not involved at all.

On Mon, 22 Jan 2024, 18:17 Grisha Levit,  wrote:

>
> That's not quite what happens. These scripts get executed by forking the
> current bash process (without exec). The new shell resets its state and
> runs the script.
>

I'm broadly aware that this is what happens; however it's not impossible
that such could be handled by a short script built into the Shell, rather
than as native C code.

The debugger message is afaict an artifact of not quite resetting
> completely -- if you had extdebug on in the shell from which you ran `./b`,
> the forked shell will try to load the debugger start file (as when running
> `bash -O extdebug ./b`)
>

Okay that makes sense; I leave extdebug turned on in my interactive Shell
so that caller will report more info.

Note that the script still runs, `B` is printed.
>

Yeah I saw that.

So these are both sides effects of the same thing: not properly resetting
the Shell state when creating an interpreter for a script without a #!

-Martin

>


Re: ./script doesn't work in completion function

2024-01-21 Thread Martin D Kealey
Hi Oğuz

On Sun, 21 Jan 2024 at 03:20, Oğuz  wrote:

> $ echo echo foo bar >s
> $ chmod +x s
>

You seem to have created an invalid executable. It seems that scripts
without a #! can only be run with help from the debugger library; for
example, this is what I get when I run up bash_5.1.3p47 (built from commit
f3a35a2d601a55f337f8ca02a541f8c033682247):


$ cat a
#!/bin/sh
echo A
$ cat b
==> b <==
echo B

$ ./a
A

$ ./b
./b:
/home/martin/lib/bash/f3a35a2d601a55f337f8ca02a541f8c033682247/share/bashdb/bashdb-main.inc:
No such file or directory
./b: warning: cannot start debugger; debugging mode disabled
B



So I'm guessing that if the debugger is triggered in the middle of a
completion function, it's likely to get stuck. (Maybe it's writing a prompt
to stdout?)

I get the same result for
 bash_5.0.0rc1 (built from commit f250956cb2a8dca13fc0242affc225f9d6983604)
 bash_4.4.23p49 (build from commit 64447609994bfddeef1061948022c074093e9a9f)
 bash_4.4.0p51 (built from commit a0c0a00fc419b7bc08202a79134fcd5bc0427071)

Not exhibited by bash_4.3.x

-Martin


Re: completion very slow with gigantic list

2024-01-16 Thread Martin D Kealey
How about:

Don't sort the list, or consider "lazy sorting" only the portion of the
list that's going to be displayed. (I'd suggest using an incremental
Quicksort, which can yield a sorted sublist in almost linear time. (I
started working on this for my zcomp module until I realised it was already
sorted.))

Maybe change COMPREPLY into an associative array, where the *keys* are the
choices to be displayed.
Or more generally, use some kind of hash table rather than sorting to
remove duplicates. That would replace about N×log(N) called to strcoll()
with a trivial number of calls to strcmp().

Consider being able to attach a generator to an array, to create entries on
demand. (If it's a subshell, or coprocess, put it in its own pgrp, so you
can nuke it with SIGPIPE when you don't need any more values.) Use
progressive rendering so that (a) you see something immediately, and (b) it
doesn't hold up activity.

-Martin



On Thu, 11 Jan 2024 at 04:53, Dale R. Worley  wrote:

> Eric Wong  writes:
> > Hi, I noticed bash struggles with gigantic completion lists
> > (100k items of ~70 chars each)
>
> A priori, it isn't surprising.  But the question becomes "What
> algorithmic improvement to bash would make this work faster?" and then
> "Who will write this code?"
>
> Dale
>
>


Re: document that read built-in can't return zero-length string in the middle of input

2024-01-13 Thread Martin D Kealey
Random irreversible behaviour change strikes again.

This changed in Bash 4.3.0, and to make matters worse, shopt -s compat42
does not restore the previous behaviour.

Up to Bash 4.2, read -N1 would indeed set the receiving variable to empty
while returning a zero status, having read only one byte.

> $ printf 'a\0b' | bash_4.2.9p1-release -c 'for i in 0 1 2 ; do read -N1
c[i] ; s[i]=$? ; done ; declare -p c s'
> declare -a c='([0]="a" [1]="" [2]="b")'
> declare -a s='([0]="0" [1]="0" [2]="0")'

>From Bash 4.3 onwards the behaviour reported by the OP is observed:

> $ printf 'a\0b' | bash_4.3.0p52-release -c 'shopt -s compat42 ; for i in
0 1 2 ; do read -N1 c[i] ; s[i]=$? ; done ; declare -p c s'
> declare -a c='([0]="a" [1]="b" [2]="")'
> declare -a s='([0]="0" [1]="0" [2]="1")'

I had already suspected this was the case, since Xterm mouse tracking would
get glitchy in large windows. (In mouse tracking mode, Xterm will report
the mouse position as ESC [ M c1 c2 c3,  where c1, c2 & c3 are single bytes
encoding the button state, column, and row, each as the value plus 32. So
when the mouse is over column 224 of an Xterm, the column is reported by
sending a NUL byte. If that byte is discarded, the following input byte
could mistakenly be consumed as part of the mouse-position escape sequence;
if it's the ESC that starts the next sequence, things go horribly wrong.)

I "fixed" the issue for me by:
(a) not writing anything beyond column 223, and
(b) rotating my screen to portrait mode, so that a full-screen Xterm is 179
columns and 147 lines.

Oh wait, if I switch to "tiny" font, I get 239 lines. Dang.

-Martin


On Fri, 12 Jan 2024 at 11:02, Greg Wooledge  wrote:

> On Fri, Jan 12, 2024 at 01:29:19AM +0100, Ángel wrote:
> > One might say "reading exactly nchars characters into the name",
>
> I would still find that confusing.  What actually counts is how many
> characters are *stored* in the variable, not how many characters are
> *read* from the input.
>
> > but
> > given that there's no mention that the NULs are never stored in
> > variables, I would tend to add a line below saying e.g. "NUL characters
> > cannot be stored in bash variables and are always ignored by read".
>
> I would be as explicit as possible.  Don't require the reader to put
> any pieces together themselves.
>
> How about this for the man page:
>
> -N nchars
> read returns after storing exactly nchars characters in the
> first named variable (or REPLY if no variable is named), unless
> EOF is encountered or read times out.  read does not wait for
> a complete line of input; any delimiter characters encountered
> in the input are not treated specially, and do not cause read to
> return before storing nchars characters.  NUL characters are
> ignored, as they cannot be stored in variables.  The result is
> not split on the characters in IFS; the intent is that the
> variable is assigned exactly the characters read (with the
> exceptions of NUL and backslash; see the -r option below).  If
> multiple variable names are given, input is only stored in the
> first; all other variables will be empty.
>
> And this for the help text:
>
> -N nchars  return only after storing exactly NCHARS characters, unless
>EOF is encountered or read times out, ignoring any NUL or
>delimiter characters
>
>


Re: $((expr)) allows the hexadecimal constant "0x"

2024-01-09 Thread Martin D Kealey
On Tue, 12 Dec 2023, 05:56 Zachary Santer,  wrote:

> On Mon, Dec 11, 2023 at 9:26 AM Chet Ramey  wrote:
> > Part of the issue here is that distros -- Red Hat, at least -- never
> > upgrade the version of bash they started with. Red Hat will `support'
> > bash-4.2 as long as they support RHEL 7.
> To be fair to the Red Hat people, "my scripts used to work and now they
> don't" is the issue they're trying to avoid, at least within a release.
>

Which is exactly my point about these random changes "because we think it
must be a mistake".

There are users out there that depend on these supposedly-broken features.

It is disingenuous to justify a change on the grounds of "alignment with
the documentation". The manual is not a specification (much less a formal
one), and leaves out a great many things that would be required in the
latter. There are *many* things that are "like" something else, but with
extensions that are unexplained and often completely unmentioned. Moreover,
the manual is huge, and not a tutorial, so most users only consult it when
they already have a problem. (And those of us who do routinely consult it
will perform tests when the manual is unclear.)

The source of truth is the implementation, not the manual, and it is quite
offensive to imply that users are at fault when they don't assume the
reverse. We know there are errors and gaps in the documentation, and so do
the users.

Simply disabling existing behaviours breaks things for people who use them,
and doesn't actually "fix" scripts that are already broken. (Those scripts
may get a better explanation of why they're wrong, but they don't magically
start working as intended by their authors.)

It's backwards to suggest that bare "0x" should be prohibited because the
man page says so. The words “Integer constants follow the C language
definition, without suffixes or character constants” were not added to the
manual until June 2019 in the Devel branch (as commit
48492ffae22d692594757e53fb4580ebb1f506cf), and did not land in the Master
branch until December 2020 (as commit
8868edaf2250e09c4e9a1c75ffe3274f28f38581). Ubuntu LTS users would then have
first had a chance to read them in April 2022.

It is not reasonable to expect users to notice that this one line had been
inserted into the manual page but without any actual change of behaviour,
and then to be told 5 years later that NOW our scripts are wrong because
they don't match the man page.

If behaviour is going to change, it should be announced in advance, as a
preface to the change log. Then at least users will have a chance to find
out about it before it bites them, and perhaps provide feedback that such a
proposed change would be undesirable.

So please:
* EITHER change the man page to match the behaviour (I suggest removing the
words «without suffixes or character constants» and substituting «except
that no digits are required after any prefix, and type modifier suffix
characters ('L', 'S', & 'U') are not accepted. Unlike C, Bash does not
recognize quoted characters as integer constants».

* OR add a "Proposed Future Changes" section to the top of CHANGES, with an
entry explaining this change and its rationale, AND delay actually
implementing the change until after the announcement makes its way into
common LTS channels (let's say, April of even-numbered years).

Now would be a good time to start working on a "proposed changes" section,
so that it will have plenty of time to be incorporated into the LTS
releases in April this year.

-Martin

PS: C is now quite peculiar in having "character constant" mean a kind of
*integer*, and aside from C++, probably no other common language shares
this feature. Even most C programmers do not fully understand what it
means, so VERY few Bash users would understand it without further
explanation. Even if you don't opt for my first preference, I suggest using
my replacement anyway, just without «no digits are required after any
prefix, and».

>


Re: complete NAME seems to diasable completion for NAME in the case of git

2023-12-21 Thread Martin D Kealey
On Fri, 22 Dec 2023, 05:55 Andreas Schwab,  wrote:

> If you want to print existing completions, use
> complete -p [NAME...].
>

The problem is, there's a documentation error.

"help complete" says "if no options are supplied…" when it should say
something more like "if the -p option is given, or if no arguments are
given…"

-Martin

>


Re: issue with debug trap

2023-12-18 Thread Martin D Kealey
On Sat, 16 Dec 2023 at 07:21, Giacomo Comes  wrote:

> debugon () {
> trap 'if (($?)); then echo "$((LINENO-1)): $(sed -n "$((LINENO-1))p"
> "$0")" ; fi' DEBUG
> }
> debugoff () {
> trap '' DEBUG
> }
>
>
Although LINENO is the command that's about to be executed, that does not
imply that LINENO-1 is necessarily the line that contains the command just
executed, whose status is in $?.

It could be wrong if there's a loop or branch, or even just blank lines.
Another problem is that it doesn't tell you which file the line is in.

A better approach is to remember the current line and then use that during
the next trap, perhaps something more like this:

# adjust to taste, especially if you like colour
debug_fmt='%-3u %s:%u %.40s\n'
debug_file=$BASH_SOURCE
debug_line=$LINENO

debugtrace() {
local e=$?
if ((e)) ; then
read -r debug_command < <( tail -n +$debug_line < "$debug_file" )
printf >&2 "$debug_fmt" "$e" \
"${debug_file##*/}" \
"$debug_line" \
"$debug_command"
fi
IFS=' ' read debug_line debug_file < <( caller )
}
debugon() { trap debugtrace DEBUG; }
debugoff() { trap '' DEBUG; }

-Martin


Re: $((expr)) allows the hexadecimal constant "0x"

2023-12-11 Thread Martin D Kealey
On Mon, 11 Dec 2023 at 06:55, Chet Ramey  wrote:

> It came up as a bug report in
>
> https://lists.gnu.org/archive/html/bug-bash/2019-06/mstheng00042.html
> 
>
> (part of the followup discussion after the second linked thread above)
> and the consensus among those who participated was that it was a good
> thing to prevent base# without any digits from silently being treated
> as 0.


The wrap-up at the end of that discussion was this:

> *I'm not sure how relevant that language is to integer constants in
> expressions. I could also note that the language describing the base#n
> syntax only talks about digits, letters, `@', and `_'. The bash definition
> of arithmetic evaluation is taken from C. That includes integer constants,
> and, while the base#value syntax clearly extends the C definition of a
> constant, the `-' (and `+', FWIW) is still an operator as defined by C.*


One could be forgiven for taking that to mean "this is behaving as expected
and won't be changed".

Do you think there would have been more discussion in different
> circumstances?


That entire discussion was between 4 people, and started and finished in
less than a week.
Anyone who happened not to be around would have missed it, including me.

Can a consensus among 4 people really be fair and representative of
millions of Bash users?

Would you have participated, considering there's no sign of you on the
> bug-bash list between 2016 and 2020?
>

When the instructions say "send bug reports to X-bugs", it's natural to
assume that the primary audience for X-bugs is the people who fix bugs, and
other people should stay away. So for a long time I didn't join this list
because I wasn't in a position to contribute as a fixer.

Of course, that's not the only reason why people don't participate:
sometimes real life gets in the way and we don't have time to participate,
sometimes for weeks on end.

But yes, I would have participated had I known this discussion was going on.

In most projects this lack of discussion wouldn't matter: after the
discussion comes a proposed code change, and then code review, and then
even after release, it's an "experimental feature that may be retracted".

When we don't have those extra steps, and every release of bash is
"forever", yes, we need more considered discussion up front.

-Martin

PS: When I said "change the definition of a token", I meant specifically
changing the definition of the integer constant token, to require at least
one "digit" after the '#'.


Re: $((expr)) allows the hexadecimal constant "0x"

2023-12-09 Thread Martin D Kealey
On Sun, 10 Dec 2023, 12:15 Zachary Santer,  wrote:

> On Thu, Nov 30, 2023 at 5:19 AM Martin D Kealey 
> wrote:
> > > > This change will break scripts that use $((10#$somevar)) to cope with
> > > > somevar having leading zeroes OR BEING EMPTY.
> Beside the point, but I wanted to point out how easy this is to work
> around.
>

WRITING a work-around is trivial. I did that a long time ago.

DEPLOYING a work-around is a much bigger deal. I have no way of contacting
most of my users, and they weren't installing packages that I could push
updates to. I'm still getting sporadic complaints from users I've never
heard of, as they finally upgrade to newer versions of Bash and my old
scripts fail as a result.

To say that I'm peeved by the unnecessary work this has created for me
would be an understatement.

Quick hack changes like this need to be considered carefully, not rushed
through in a few days.

And sometimes Bash isn't the thing that needs fixing. Maybe the manual
needs to be clearer that "#" is not an "operator" like "+" or "/" but
rather part of an unsigned integer constant (and that "-" is NOT part of
such a constant).

But even if you still thought this was worth doing, it wasn't necessary to
make $((10#)) completely illegal: Bash could look ahead and only intervene
if the following character is '-' (affecting $((10#-digits)) but not
$((10#))).

Moreover, it only needed to output a warning, rather than fail, perhaps
“Warning: base# may only be followed by digits; "-" interpreted as
subtraction rather negation.”

If I were fixing this today, I would introduce a new category of
diagnostics that are only displayed when set -x is in effect (and sent to
>&${BASH_XTRACEFD-2}). Then I'd raise such a warning about 10#- but with
the old zero value.

-Martin

>


Re: $((expr)) allows the hexadecimal constant "0x"

2023-11-30 Thread Martin D Kealey
Apropos the intentional breakage of ((10#$X)) I wrote (back in June 2023):
> > Is there any chance this can be reversed before it becomes an official
release?
to which Chet replied:
> This happened years ago.

I have to admit that I'm finding that being a maintainer for a large suite
of Bash scripts is a very frustrating experience because Bash keeps
shifting under me.

Many (perhaps most) script maintainers don't hear about a change until
several years after it's incorporated into an official release of Bash,
because that's how long it takes for it to percolate through several tiers
of release processes and actually get installed. We test our scripts for
backwards compatibility when we make changes, but most of us don't have
automated systems to check for *forwards* (or *anticipatory*) compatibility.

This is especially problematic when combined with "every release of Bash is
valid forever". Without the ability to firmly say "ooops, that version of
Bash was broken, everyone please stop using it", this makes for a terrible
experience for maintainers of scripts that then have to run where someone
has chosen the version of Bash that's available.

If there's a bug tracking system beyond "threads in a mailing list", I'd
like to know how I can get access to it.

The "roadmap" seems to consist of Chet occasionally announcing that some
refactoring has been done or is in progress; if there's a list of future
plans somewhere, I'd love to see it clearly published.

Bug triage seems to consist of a day or so's discussion in this list, but
when a solution is proposed it usually gets little or no discussion, and no
code review whatsoever; often we only get "this will be fixed in [an
upcoming] release", with no mention of *how* a problem will be fixed.

There's nobody actively canvassing users, asking "will this [insert
detailed proposal] break your code?". And no, telling this mailing list is
NOT enough, because this list is unrepresentative of maintainers of Bash
scripts and other Bash users. (About now Chet is shrugging his shoulders
and saying "what else can I do". The fix for that is not to have the whole
project crammed into one human's brain. And yes, I do volunteer to help
sort this out.)

(Aside: even as an "experienced" Bash user, I didn't join this list until
relatively recently because I was under the impression that it was only for
people who actually *fixed* the bugs in Bash, not merely complained about
them. A key reason I believed this was because this list is the target
address for the "bashbug" reporting script. Perhaps this list should be
renamed "bash-testers", if that's its true purpose.)

End-of-life dates for previous versions should be promulgated as soon as
possible. I have my own idea of what they should be, but I'd like to hear
*how* other people would choose their dates.

When I said:

> > This change will break scripts that use $((10#$somevar)) to cope with
> > somevar having leading zeroes OR BEING EMPTY.


Chet replied:

> this clearly invalid syntax


"Clearly Invalid" is a matter of perspective. To me "no digits at all" is
the MOST logical way to write value zero, being the logical result of
"remove leading zero digits". (I may be sufficiently atypical that nobody
on this list agrees with me, but I'm definitely not unique globally.)

That change introduced an inconsistency into Bash:

 X= ; ((X == 0))  # true
 X= ; ((10#$X == 0)) # was true, now broken

I liked it better how it was.

(From a language design point of view, the only practical limitation to
treating an empty string as numerically zero is that an empty string needs
to be a recognisable lexical token; this is not the case in human
languages, which is why we write "0" instead, but happily in most computer
languages an empty string is indeed trivially differentiable from the
absence of any token. Bash is the exception in that $x may expand to one,
many, or no tokens, depending on context, but in *this* context it's clear.)

Chet said:

> You might be interested in the discussion:


> https://lists.gnu.org/archive/html/bug-bash/2018-07/msg00015.html
> https://lists.gnu.org/archive/html/bug-bash/2019-06/msg00039.html


I read that and wept.

Just because someone didn't understand the difference between a single
token and an entire arithmetic expression doesn't mean "change the
definition of a token" was the right response.

(What I really don't get is why were they trying to use an explicit decimal
radix on something that was the result of a previous arithmetic expansion,
and therefore already guaranteed to be decimal?)

Even if this seemed to be clearly the reasonable response, why was there no
clear & separate solicitation of feedback for "we propose to invalidate
"10#" without any following digits"?

>> However, a somewhat similar situation with hex prefix,
> >> 0xDIGITS, still allows just "0x" as a valid zero constant.
> >>
> >> Not sure whether this should be considered a bug,
> >> and whether it's worth fixing - just lettin

Re: Command hangs when using process substitution

2023-11-18 Thread Martin D Kealey
Perhaps some background would help here; I start by highlighting this
section in "man xclip":

   -l, -loops
  number  of X selection requests (pastes into X applications)
to wait for before exiting, with a value of 0 (default) causing xclip to
wait for an unlimited number of requests until another application
(possibly another invocation of xclip) takes ownership of the selection

What's mentioned here as "the selection" is approximately what you probably
think of as "the clipboard".

X-windows does not itself hold the clipboard. Instead, that responsibility
falls to the program offering the copy. When a program goes "copy", it
notifies the WM that it has a "selection" available; then subsequent
requests for pasting are forwarded to that program, and it replies with the
selection to be pasted. The WM notifies the original program when the
selection is no longer required, in particular, when it has been supplanted
by a notification from another program. If the original program dies or
exits, the clipboard dies with it.

This means that xclip must notify the WM that it has a selection and then
stay active so that a later program can ask for it, but at the same time,
xclip is expected to exit immediately once it's captured its stdin.

It manages to do both simply by forking.

However the new child process thus created does not close its stdin, stdout
or stderr.

The forked child will exit when the WM tells it that its selection is no
longer required, and that's the delay that you're seeing.

On Sat, 18 Nov 2023 at 23:36, dbarrett--- via Bug reports for the GNU
Bourne Again SHell  wrote:

> The following command, when run in bash, should copy the word "foo" to
> the X primary selection (by process substitution), output the line
> "fxx", and exit:
>
> echo foo | tee >(xclip -i) | tr o x
>
> The command does print "fxx" but then it hangs
>

The same command behaves correctly when run in zsh
>

If zsh is not connecting the output of xclip to the input of tr, I would
regard that as INcorrect.

I note that both the follow "hang" the same in zsh and bash:

  xclip (xclip -i) ) | tr o x

... because both wait for the forked child of xclip to finish.

The fact that you don't *want* the output of xclip connected to tr (because
it makes tr wait for xclip *and all its children* to finish, while the
shell waits for tr to finish) does not make zsh "correct".

-Martin


Re: test -lt inconsistent about white space

2023-10-29 Thread Martin D Kealey
I'm more concerned that the error message is misleading; "integer
expression expected" is NOT true; rather an integer LITERAL is expected
(meaning an optional sign followed by one or more digits).

As for fixing the inconsistency, I would rather get rid of whitespace
skipping entirely, perhaps with a shopt to re-enable it.

-Martin

On Sun, 29 Oct 2023 at 05:08, Paul Eggert  wrote:

> Consider the following shell script 'doit':
>
> sp=' '
> nl='
> '
> test "${sp}1${sp}" -lt "${sp}2${sp}"
> test "${nl}3${sp}" -lt "${nl}4${sp}"
> test "${sp}5${nl}" -lt "${sp}6${nl}"
> test "${nl}7${nl}" -lt "${nl}8${nl}"
>
> Running the command "bash doit" outputs:
>
> doit: line 6: test:  5
> : integer expression expected
> doit: line 7: test:
> 7
> : integer expression expected
>
> The problem occurs because strtoimax accepts all forms of leading
> whitespace, whereas Bash accepts only space and tab after the integer.
> This is inconsistent: Bash should treat trailing whitespace the same way
> it treats leading whitespace, and should accept all of doit's 'test'
> commands, as Dash does.
>
> Proposed patch attached.


Re: bash tries to parse comsub in quoted PE pattern

2023-10-22 Thread Martin D Kealey
On Wed, 18 Oct 2023 at 22:19, Zachary Santer  wrote:

> On Tue, Oct 17, 2023 at 5:56 PM Emanuele Torre 
> wrote:
>
> > bash-5.1$ letters=( {a..z} ); echo "${letters["{10..15}"]}"
> > k l m n o p
>
...

> So how important is it to maintain undocumented behavior?
>

This is the expected outcome of doing brace expansion *followed by*
variable expansion, with separate parsing phases. Bash does not generally
document what happens when you combine features if it can reasonably be
inferred from the documentation of the individual features.

First, brace expansion translates:
"${foo["{10..15}"]}"
into
"${foo["10"]}" "${foo["11"]}" "${foo["12"]}" "${foo["13"]}" "${foo["14"]}"
"${foo["15"]}"
and then the variables are expanded individually.

Then there's the question "Was that even supposed to work like that?"


Supposed to? Yes. A good idea? Probably not. Useful? Definitely, but there
should be a better way to do it.

If so, you'd think it would generalize to being able to pass a series of
> whitespace-delimited indices to an array expansion.
>

That does not follow. The shell grammar is characterised by having a large
number of phases, which means brace expansion occurs without regard for the
grammatical validity of "${var[" as its prefix or "]}" as its suffix; when
the brace expansion occurs, the prefix and suffix are not yet considered as
being a variable expansion.

Putting an array or list inside the [] won't be expanded until after the
outer variable expansion has begun, so it would require entirely new rules
for handling array subscripts that are "lists". And that *would* need to be
separately documented.

Of course, I would love to see something like "${array[10..15]}" work
"properly", unlike the current behaviour of "${array[@]:10:6}"

(The latter unnecessarily exposes the user to the implementation details of
"array". Whenever I complain about this, the responses justifying the
current behaviour as "sensible" tacitly rely on "it's a list, not an
array", which is quite perverse given that all the documentation is about
"arrays".)

Why does "${#@}" expand to the same thing as "${#}"?


You're right, we don't need ${#} and we should get rid of it. 

${#@} follows the pattern of correspondence between ${array[@]} and ${@};
they both take all the same modifiers, including the # prefix for "count".

Ignoring which one came first, $# can be seen simply as a required
shorthand for ${#@}.


> Why is $[  ]  equivalent to $((  ))?


Because POSIX was too slow choosing which one to endorse (there were
several proposals), by which time Bash had already implemented $[...], and
people had started using it. (And from a language design choice, $[...] is
simpler to get right; no need to look ahead for '))' to see if it's
actually a command substitution.)

Does that stuff need to continue to work forever?
>

For some combination of default modes, yes, it must. There are still
scripts out there running that haven't been modified this millennium.

That's not to say that there shouldn't be a "lint" mode which warns about
obsolete features, or a "strict" mode which aborts if they're used, but
they can't just be removed from the default.

-Martin


Re: variable set in exec'ing shell cannot be unset by child shell

2023-10-14 Thread Martin D Kealey
On Sun, 15 Oct 2023, 02:03 Robert Elz,  wrote:

> Date:Sat, 14 Oct 2023 14:46:12 +1000
> From:    Martin D Kealey 
> Message-ID:   a2+3nnknhm5a+...@mail.gmail.com>
>
>
>   | Back when I used the Bourne Shell we didn't have `local`, so we used to
>   | write `var= func` to make sure that `func` couldn't mess with *our*
> `var`.
>
> If you were using the original Bourne shell you couldn't have done that,
> as it had no functions either.


Fair point, it was just whatever was /bin/sh on Ultrix at the time. I was a
uni student so I don't even know what version of Ultrix we were using.

I take your point that the Shell (and especially Bash) has grown
Frankenfeatures way beyond a mere command interpreter, in ways that are
fundamentally irreconcilable.

But I don't think sticking to our guns about "let's go back to simple" is
the best way forward.

The one thing to be said for the Shell is that it's universal. If we kill
it, what will take its place? I already have to install Bash, Awk, Perl,
Python, and Node just to have a running system. How many more will be
needed after Bash finally dies?

If the Shell is left out in no man's land, with a shortfall in features so
it can't be a "real" programming language, but at the same time with the
crazy complexity for users to learn, we pretty much doom it to extinction.

If the Shell is truly a moribund legacy language, we should stop changing
it. No new features. No "bug fixes". No new safety guards.

Or we design a new language that feels more like a regular programming
language even if its syntax is weird. In my opinion it should have:
Proper per-package feature selection;
proper lexically scoped variables & functions; opt-in rather than opt-out
globbing & word splitting;
opt-in rather than opt-out filedescriptor inheritance;
strongly typed variables, with string/number/array/compound/filehandle
values;
distinguishable binary (octet-stream) and text (Unicode/utf-8), with
support for null bytes in strings, and a Cstring attribute to prohibit
assignments that include null bytes (because execve is so central to
everything);
support for AF_LOCAL sockets as bidirectional pipes;
exceptions separate from exit-status, with ability to enrol some but not
all commands for the set-e treatment.

Yes that's far too much work for one person; I do not expect Chet to do all
this, I expect there to be a governance team.


Re: variable set in exec'ing shell cannot be unset by child shell

2023-10-14 Thread Martin D Kealey
On Sun, 15 Oct 2023, 03:15 Ti Strga,  wrote:

> On Fri, Oct 13, 2023 at 5:59 PM Grisha Levit 
> wrote:
> > IMHO you'd be better off just putting a `{` line at the start and `}`
> line at the end of your scripts
>

The big weakness of the "{}" approach is that if a writer forgets to do
> that, there's no way to detect it until a script is modified and the
> running one crashes.  But in the case of cloning, we can add such explicit
> test-and-detection for "did you forget to trigger the cloning" in the few
> scripts that really, really need it.
>

I think I would attack this from an entirely different angle: what about
simply modifying Baeh so that it slurps in the entire file upon opening it?

You could even hide that inside an LD_PRELOAD module so you don't have to
recompile Bash, and so that it's inherited automatically.

-Martin


Re: variable set in exec'ing shell cannot be unset by child shell

2023-10-14 Thread Martin D Kealey
On Sun, 15 Oct 2023, 03:05 Greg Wooledge,  wrote:

> On Sat, Oct 14, 2023 at 12:55:21PM -0400, Ti Strga wrote:
> > it's just the "[[ -v foo ]]" tests to see where along the cloning
> process we are.
>
> *Shudder*
>

Likewise, b.

If the *real* goal is to overwrite a running script with a new version of
> itself, and then re-exec it, then the correct solution is to wrap the
> script in a single compound command so that it gets read and parsed up
> front, before beginning execution of the main loop.  Either wrap the whole
> thing in "{" ... "}" as Grisha suggested, or wrap the whole thing in a
> "main()" function and then call main "$@".
>

Agreed. Either way, don't forget to put "exit;" just before the closing
"}". Or write « exec main "$@" ».

(For good measure I would also make sure it's a valid posix text file with
a terminal newline, so that cat rubbish >> script can't break it.)

Personally I don't much care for the main "$@" style as it makes an extra
copy of argv for no particularly good reason, and Shell Is Not C™; but it's
better than allowing the script to blow up with parse errors after it's
started running.

-Martin

>


Re: variable set in exec'ing shell cannot be unset by child shell

2023-10-13 Thread Martin D Kealey
On Sat, 14 Oct 2023 at 06:33, Robert Elz  wrote:

> The issue we have (which possibly might be similar in bash, but only
> possibly - but it would explain the symptoms) is that when one does
>
> VAR=value command
>
> "VAR" is essentially made a local variable for command, so its value
> in the outlying environment is unchanged by the assignment of value.
>
...

> But when the command is a function, or a shell builtin (or the '.'
> command, which is almost both of those) then we have some strange
> effects.
>
...

> But that's wrong, all "VAR=foo command" is supposed to do, is to put
> VAR into the environment of command, without altering it in the shell
> environment that is executing command.   If command is a function, or
> a '.' script (or a shell builtin, which was the context in which I
> first considered this issue) which alters VAR, the global VAR should
> be altered
>

Respectfully I must disagree.

This aspect of Bash's behaviour has a very long historical precedent.

Back when I used the Bourne Shell we didn't have `local`, so we used to
write `var= func` to make sure that `func` couldn't mess with *our* `var`.

Given that "put in the environment" actually means "create a shell variable
and mark it as exported", it's difficult to see how "only put into the
environment but don't make it a local variable" could work without making
the semantics even more contorted and confusing.

It seems to me that what's needed is a new model for variables, where the
entire scope chain can be inspected and modified where necessary, and where
the existing declare/local/export/typeset and unset are simply shorthands
for more comprehensive operations.


Re: Warn upon "declare -ax"

2023-09-05 Thread Martin D Kealey
On Wed, 6 Sep 2023, 01:46 Kerin Millar,  wrote:

> My pet name for it is arrayshock.
>
> $ arr=(foo bar baz)
> $ export arr
> $ env | grep ^BASH_ARRAY_
> BASH_ARRAY_arr%%=([0]="foo" [1]="bar" [2]="baz")
> $ ./bash -c 'declare -p arr'
> declare -ax arr=([0]="foo" [1]="bar" [2]="baz")
>

I've often wondered why it was designed this convoluted way, rather than
simply putting separate items into the environment, thus:

arr=foo
arr[0]=foo # both of these have reasonable justifications and reasonable
repudiations; choosing one is a different discussion
arr[1]=bar
arr[2]=baz
arr-=a # maybe, to indicate declare -a.

I vaguely recall that there was some notion that POSIX required "only valid
C identifiers" (alphanumeric and underscore but without a leading digit) to
the left of "=", but in that case the current scheme using "%%" is not
compliant either.

-Martin

>


Re: RFC: changing printf(1) behavior on %b

2023-09-01 Thread Martin D Kealey
If compatibility with C is really that important,
shouldn't we be fixing %c? Its current behaviour as a synonym for %.1s
doesn't provide significant utility, and arguably differs from C's "take an
int and output the corresponding single byte", not "take the first byte of
a string and output that".


Whilst I wouldn't object to adding %#s (or %#b for that matter), I'm
uncomfortable about changing existing behaviour, especially when it's just
for the sake of linguistic simplicity in the standard.)

Plenty of projects have functions that accept a format string and pass it
through to printf (sometimes with names like warnf, errorf, panicf); it
would be non trivial to locate indirect format string parameters. An
estimate of "a few years" is WAY short of the timeframe needed to weed out
old usage; embedded devices typically run the same version of bash from the
time they leave the factory until they reach the scrap disassembly plant
(or landfill) a decade or more later.

One of the benefits of printf over echo is that there aren't two mutually
incompatible ways of interpreting the data; this would take us back to the
bad old days of having to dynamically select the format string depending on
which version of the Shell the script is running under.

Please no.

-Martin

On Fri, 1 Sept 2023 at 01:35, Eric Blake  wrote:

> In today's Austin Group call, we discussed the fact that printf(1) has
> mandated behavior for %b (escape sequence processing similar to XSI
> echo) that will eventually conflict with C2x's desire to introduce %b
> to printf(3) (to produce 0b000... binary literals).
>
> For POSIX Issue 8, we plan to mark the current semantics of %b in
> printf(1) as obsolescent (it would continue to work, because Issue 8
> targets C17 where there is no conflict with C2x), but with a Future
> Directions note that for Issue 9, we could remove %b entirely, or
> (more likely) make %b output binary literals just like C.  But that
> raises the question of whether the escape-sequence processing
> semantics of %b should still remain available under the standard,
> under some other spelling, since relying on XSI echo is still not
> portable.
>
> One of the observations made in the meeting was that currently, both
> the POSIX spec for printf(1) as seen at [1], and the POSIX and C
> standard (including the upcoming C2x standard) for printf(3) as seen
> at [3] state that both the ' and # flag modifiers are currently
> undefined when applied to %s.
>
> [1] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/printf.html
> "The format operand shall be used as the format string described in
> XBD File Format Notation[2] with the following exceptions:..."
>
> [2]
> https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap05.html#tag_05
> "The flag characters and their meanings are: ...
> # The value shall be converted to an alternative form. For c, d, i, u,
>   and s conversion specifiers, the behavior is undefined.
> [and no mention of ']"
>
> [3] https://pubs.opengroup.org/onlinepubs/9699919799/functions/printf.html
> "The flag characters and their meanings are:
> ' [CX] [Option Start] (The .) The integer portion of the
>   result of a decimal conversion ( %i, %d, %u, %f, %F, %g, or %G )
>   shall be formatted with thousands' grouping characters. For other
>   conversions the behavior is undefined. The non-monetary grouping
>   character is used. [Option End]
> ...
> # Specifies that the value is to be converted to an alternative
>   form. For o conversion, it shall increase the precision, if and only
>   if necessary, to force the first digit of the result to be a zero
>   (if the value and precision are both 0, a single 0 is printed). For
>   x or X conversion specifiers, a non-zero result shall have 0x (or
>   0X) prefixed to it. For a, A, e, E, f, F, g, and G conversion
>   specifiers, the result shall always contain a radix character, even
>   if no digits follow the radix character. Without this flag, a radix
>   character appears in the result of these conversions only if a digit
>   follows it. For g and G conversion specifiers, trailing zeros shall
>   not be removed from the result as they normally are. For other
>   conversion specifiers, the behavior is undefined."
>
> Thus, it appears that both %#s and %'s are available for use for
> future standardization.  Typing-wise, %#s as a synonym for %b is
> probably going to be easier (less shell escaping needed).  Is there
> any interest in a patch to coreutils or bash that would add such a
> synonym, to make it easier to leave that functionality in place for
> POSIX Issue 9 even when %b is repurposed to align with C2x?
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.
> Virtualization:  qemu.org | libguestfs.org
>
>
>


Re: give_terminal_to() / maybe_give_terminal_to() race

2023-09-01 Thread Martin D Kealey
On Fri, 1 Sep 2023, 15:51 Earl Chew,  wrote:

> The controlling terminal must be reconfigured before the parent gets to
> wait() for the job,


and before the child gets to exec() the program (or their equivalents).


This second point makes sense, but I don't really see why the first point
is necessary. As long as the terminal has changed ownership when waitpid()
returns, why does it matter *when* exactly? What else – if anything – needs
doing after changing the tty pgrp and before calling waitpid()?

I would have though the obvious fix was to move setting tty pgrp into the
child; this would guarantee that it happens before the execve() and before
the waitpid() returns.

-Martin


  1   2   >