Re: document that read built-in can't return zero-length string in the middle of input

2024-01-11 Thread Greg Wooledge
On Thu, Jan 11, 2024 at 08:02:04PM -0500, Greg Wooledge wrote:
> And this for the help text:
> 
> -N nchars  return only after storing exactly NCHARS characters, unless
>EOF is encountered or read times out, ignoring any NUL or
>delimiter characters

Actually, that's not good wording.  It implies that NUL and delimiter
characters are treated in the same way, but they aren't.

-N nchars  return only after storing exactly NCHARS characters, unless
   EOF is encountered or read times out, ignoring any NULs, and
   not stopping for delimiters



Re: document that read built-in can't return zero-length string in the middle of input

2024-01-11 Thread Greg Wooledge
On Fri, Jan 12, 2024 at 01:29:19AM +0100, Ángel wrote:
> One might say "reading exactly nchars characters into the name",

I would still find that confusing.  What actually counts is how many
characters are *stored* in the variable, not how many characters are
*read* from the input.

> but
> given that there's no mention that the NULs are never stored in
> variables, I would tend to add a line below saying e.g. "NUL characters
> cannot be stored in bash variables and are always ignored by read".

I would be as explicit as possible.  Don't require the reader to put
any pieces together themselves.

How about this for the man page:

-N nchars
read returns after storing exactly nchars characters in the
first named variable (or REPLY if no variable is named), unless
EOF is encountered or read times out.  read does not wait for
a complete line of input; any delimiter characters encountered
in the input are not treated specially, and do not cause read to
return before storing nchars characters.  NUL characters are
ignored, as they cannot be stored in variables.  The result is
not split on the characters in IFS; the intent is that the
variable is assigned exactly the characters read (with the
exceptions of NUL and backslash; see the -r option below).  If
multiple variable names are given, input is only stored in the
first; all other variables will be empty.

And this for the help text:

-N nchars  return only after storing exactly NCHARS characters, unless
   EOF is encountered or read times out, ignoring any NUL or
   delimiter characters



Re: document that read built-in can't return zero-length string in the middle of input

2024-01-11 Thread Ángel
On 2024-01-11 at 09:29 -0500, Chet Ramey wrote:
> On 1/11/24 2:37 AM, ilya Basin wrote:
> > Dear.
> > I needed to read 16 bytes from a binary file and tried to replace a
> hexdump call with read built-in. I expected that with "-N1" if a NUL
> character is encountered bash would assign an empty string, however
> there's no indication that a NUL character was there and it simply
> assigns the next non-NUL character to the variable.
> 
> The read builtin skips over NUL characters because you can't store them
> as part of the value of a shell variable. That seems obvious.

While doing  read -N 16 from a file bigger than 16 bytes and ending up
with less than that (e.g. only 10 bytes because the other 6 turned out
to be NULs) would be surprising, the assumption of the OP that when
using -N1 a NUL byte would become an empty string doesn't seem far-
fetched.



> What would you like to see documented? That NUL characters don't
> count towards the number of characters read to satisfy -N? Doesn't
> that follow from the above?

Not eccessarily. The phrase "reading exactly nchars characters" depends
on (a) what you consider to be reading, as you are read(2)ing more
bytes than stated with -N; and (b) what you consider to be a character.

One might say "reading exactly nchars characters into the name", but
given that there's no mention that the NULs are never stored in
variables, I would tend to add a line below saying e.g. "NUL characters
cannot be stored in bash variables and are always ignored by read".


Regards




Re: completion very slow with gigantic list

2024-01-11 Thread alex xmb sw ratchev
On Thu, Jan 11, 2024, 23:21 Chet Ramey  wrote:

> On 1/10/24 12:28 AM, Eric Wong wrote:
> > Hi, I noticed bash struggles with gigantic completion lists
> > (100k items of ~70 chars each)
> >
> > It's reproducible with both LANG+LC_ALL set to en_US.UTF-8 and C,
> > so it's not just locales slowing things down.
> >
> > This happens on the up-to-date `devel' branch
> > (commit 584a2b4c9e11bd713030916d9d832602891733d7),
> > but I first noticed this on Debian oldstable (5.1.4)
> >
> > strcoll and strlen seem to be at the top of profiles, and
> > mregister_free when building devel with default options...
> > ltrace reveals it's doing strlen repeatedly on the entire
> > (100k items * 70 chars each = ~7MB)
>
> OK. Let's look at what happens. Let's say the text you're trying to
> complete is "a" (or "").
>
> You generate a huge list of strings and store that list into a string,
> which the shell has to split into individual words (there's only one) and
> run through word expansion, as part of running compgen -W on it.
>

dunno : this list as flat text , and the matches as regex ..

Since you have to return just words from the list, you need to run
> strcmp/strcoll against each member of the list to figure out which ones
> to output to store in COMPREPLY. At least that's all done in a subshell.
>
> So you end up with every substring in $wordlist as part of $COMPREPLY.
>
> Then you hand that list to readline, which runs through it to find the
> longest common substring of all the matches. This is where you get a
> ton of strlen calls, since mbrtowc needs to know the maximum number of
> bytes to examine when converting strings to potentially multibyte
> characters. Room for improvement here, but strlen is pretty cheap.
> Still, it would reduce the number of calls.
> (lib/readline/complete.c:compute_lcd_of_matches()).
>
> You end up with a list of matches that readline wants: LCD in matches[0],
> the matches in the rest of the match list, NULL terminated. Now you have
> to postprocess it.
>
> You have to remove duplicate matches, since you didn't tell readline to
> keep them. To make that easier, you have to sort them (since you didn't
> tell readline not to). That's where you get the calls to strcmp/strcoll --
> you can't avoid them. You can't count on qsort removing duplicates for
> you, so you have to run through the array again, comparing each element
> against the next until they don't match, marking the ones that do for
> removal -- more strcmp/strcoll calls.
>
> Now you have the list of possible matches, and readline will either insert
> the longest common substring or display the match list. If you're going to
> display the match list, you have to run through the array again to
> determine the longest match, do any required processing for the
> completion-prefix-display-length and completion-display-width variables,
> possibly color the common prefix, then print the matches, which will
> call strlen() on each match to determine how much screen space the match
> will take.
>
> A lot of this is caused by readline's passing around string vectors
> (char **) instead of some struct that held a string and its length. But
> that's the public API, and I have higher-priority things to do than
> redo readline's internal completion architecture.
>
> You're welcome to take your shot, and make improvements where there are
> improvements to be made. I'd be glad to take them.
>
> Chet
>
> --
> ``The lyf so short, the craft so long to lerne.'' - Chaucer
>  ``Ars longa, vita brevis'' - Hippocrates
> Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
>
>


Re: completion very slow with gigantic list

2024-01-11 Thread Chet Ramey

On 1/10/24 12:28 AM, Eric Wong wrote:

Hi, I noticed bash struggles with gigantic completion lists
(100k items of ~70 chars each)

It's reproducible with both LANG+LC_ALL set to en_US.UTF-8 and C,
so it's not just locales slowing things down.

This happens on the up-to-date `devel' branch
(commit 584a2b4c9e11bd713030916d9d832602891733d7),
but I first noticed this on Debian oldstable (5.1.4)

strcoll and strlen seem to be at the top of profiles, and
mregister_free when building devel with default options...
ltrace reveals it's doing strlen repeatedly on the entire
(100k items * 70 chars each = ~7MB)


OK. Let's look at what happens. Let's say the text you're trying to
complete is "a" (or "").

You generate a huge list of strings and store that list into a string,
which the shell has to split into individual words (there's only one) and
run through word expansion, as part of running compgen -W on it.

Since you have to return just words from the list, you need to run
strcmp/strcoll against each member of the list to figure out which ones
to output to store in COMPREPLY. At least that's all done in a subshell.

So you end up with every substring in $wordlist as part of $COMPREPLY.

Then you hand that list to readline, which runs through it to find the
longest common substring of all the matches. This is where you get a
ton of strlen calls, since mbrtowc needs to know the maximum number of
bytes to examine when converting strings to potentially multibyte
characters. Room for improvement here, but strlen is pretty cheap.
Still, it would reduce the number of calls.
(lib/readline/complete.c:compute_lcd_of_matches()).

You end up with a list of matches that readline wants: LCD in matches[0],
the matches in the rest of the match list, NULL terminated. Now you have
to postprocess it.

You have to remove duplicate matches, since you didn't tell readline to
keep them. To make that easier, you have to sort them (since you didn't
tell readline not to). That's where you get the calls to strcmp/strcoll --
you can't avoid them. You can't count on qsort removing duplicates for
you, so you have to run through the array again, comparing each element
against the next until they don't match, marking the ones that do for
removal -- more strcmp/strcoll calls.

Now you have the list of possible matches, and readline will either insert
the longest common substring or display the match list. If you're going to
display the match list, you have to run through the array again to
determine the longest match, do any required processing for the
completion-prefix-display-length and completion-display-width variables,
possibly color the common prefix, then print the matches, which will
call strlen() on each match to determine how much screen space the match
will take.

A lot of this is caused by readline's passing around string vectors
(char **) instead of some struct that held a string and its length. But
that's the public API, and I have higher-priority things to do than
redo readline's internal completion architecture.

You're welcome to take your shot, and make improvements where there are
improvements to be made. I'd be glad to take them.

Chet

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: bash-4-2 issue

2024-01-11 Thread Chet Ramey

On 1/11/24 10:55 AM, Chet Ramey wrote:
On 1/11/24 3:46 AM, Sam Kappen via Bug reports for the GNU Bourne Again 
SHell wrote:




Thanks.
I am using a Linux host with kernel version 4.x for cross building.
It looks like autoconf is not defining the "PGRP_PIPE" macro variable as
there is no check for linux kernel version 4.


The uname check works, and the current version just sets it unconditionally
for linux kernel versions that aren't 1 or 2. It's probably time so see why
the BASH_SYS_PGRP_SYNC autoconf test isn't working, since the uname check
isn't appropriate for cross-building.


There's nothing actually wrong with that test; the error Grisha sees is
happening because the setpgid() is attempting to use a process group that
really doesn't exist, not the zombie process pid that the test checks.

Depending on how the timing goes, the wait for /usr/bin/true in the DEBUG
trap command for the second pipeline element will reap the first pipeline
process (the pgrp leader) before the second pipeline element has a chance
to try and set its process group.

The PGRP_PIPE define makes all the pipeline elements wait to execute until
they've all been created and set their pgrp, so it prevents this from
happening.


--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: document that read built-in can't return zero-length string in the middle of input

2024-01-11 Thread alex xmb sw ratchev
On Thu, Jan 11, 2024, 15:29 Chet Ramey  wrote:

> On 1/11/24 2:37 AM, ilya Basin wrote:
> > Dear.
> > I needed to read 16 bytes from a binary file and tried to replace a
> hexdump call with read built-in. I expected that with "-N1" if a NUL
> character is encountered bash would assign an empty string, however there's
> no indication that a NUL character was there and it simply assigns the next
> non-NUL character to the variable.
>
> The read builtin skips over NUL characters because you can't store them
> as part of the value of a shell variable. That seems obvious.
>

additional metadata , that keeps internally track of \0 positions , for
like printf or so later ..

What would you like to see documented? That NUL characters don't count
> towards the number of characters read to satisfy -N? Doesn't that follow
> from the above?
>
> --
> ``The lyf so short, the craft so long to lerne.'' - Chaucer
>  ``Ars longa, vita brevis'' - Hippocrates
> Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
>
>


Re: document that read built-in can't return zero-length string in the middle of input

2024-01-11 Thread Greg Wooledge
On Thu, Jan 11, 2024 at 09:29:03AM -0500, Chet Ramey wrote:
> The read builtin skips over NUL characters because you can't store them
> as part of the value of a shell variable. That seems obvious.

I would argue that it's not obvious at all when using -N.  The help
text for -N says "return only after reading exactly NCHARS characters".
Having it return after reading NCHARS + 1 characters, just because one
of them was NUL, is a surprise.

> What would you like to see documented? That NUL characters don't count
> towards the number of characters read to satisfy -N?

That would work for me, though I can't speak for the OP.



Re: bash-4-2 issue

2024-01-11 Thread Chet Ramey
On 1/11/24 3:46 AM, Sam Kappen via Bug reports for the GNU Bourne Again 
SHell wrote:




Thanks.
I am using a Linux host with kernel version 4.x for cross building.
It looks like autoconf is not defining the "PGRP_PIPE" macro variable as
there is no check for linux kernel version 4.


The uname check works, and the current version just sets it unconditionally
for linux kernel versions that aren't 1 or 2. It's probably time so see why
the BASH_SYS_PGRP_SYNC autoconf test isn't working, since the uname check
isn't appropriate for cross-building.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: bash-4-2 issue

2024-01-11 Thread Chet Ramey

On 1/10/24 5:33 PM, Grisha Levit wrote:

On Mon, Jan 8, 2024 at 7:04 AM Sam Kappen via Bug reports for the GNU
Bourne Again SHell  wrote:

We see that bash throws the "Operation not permitted" error when doing
chained pipe operation
along with a debug trap.

We set a debug trap here "my_debug" to save the terminal commands entered.
The GNU bash, version used is  4.2.

root@freescale-p2020ds:~/dir#  ls -l | grep a | grep b | grep c
-sh: child setpgid (4238 to 4232): Operation not permitted


root@freescale-p2020ds:~/dir# trap
trap -- '' TSTP
trap -- '' TTIN
trap -- '' TTOU
trap -- 'my_debug' DEBUG
root@freescale-p2020ds:~/dir#

Platform: Linux 3.10 kernel on PPC target.

It seems setpgid is failing because the process group of the pipeline does
not exist at that time.

This issue is not seen on bash version 4.4.


I'm not sure this is fixed. In all versions, including 4.2, 4.4, 5.2, and the
current devel version, I see what seems to be the same error, triggered by a
pipeline when job control is enabled and the DEBUG trap executes an external
command.


I can't reproduce this on RHEL 9, which is running Linux kernel version 5.
PGRP_PIPE is defined because the BASH_SYS_PGRP_SYNC test enables it.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: document that read built-in can't return zero-length string in the middle of input

2024-01-11 Thread Chet Ramey

On 1/11/24 2:37 AM, ilya Basin wrote:

Dear.
I needed to read 16 bytes from a binary file and tried to replace a hexdump call with 
read built-in. I expected that with "-N1" if a NUL character is encountered 
bash would assign an empty string, however there's no indication that a NUL character was 
there and it simply assigns the next non-NUL character to the variable.


The read builtin skips over NUL characters because you can't store them
as part of the value of a shell variable. That seems obvious.

What would you like to see documented? That NUL characters don't count
towards the number of characters read to satisfy -N? Doesn't that follow
from the above?

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: bash-4-2 issue

2024-01-11 Thread Sam Kappen via Bug reports for the GNU Bourne Again SHell
On Thu, Jan 11, 2024 at 1:03 AM Grisha Levit  wrote:

> On Wed, Jan 10, 2024 at 5:33 PM Grisha Levit 
> wrote:
> > I'm not sure this is fixed. In all versions, including 4.2 [...]
> >
> > $ bash -m -c 'trap /usr/bin/true DEBUG; :|:'
> > bash: child setpgid (49581 to 49579): Operation not permitted
>
> Correction, versions prior to 4.3 did not respect the -m flag at
> invocation,
> so the command should be:
>
> bash -c 'set -m; trap /usr/bin/true DEBUG; :|:'
>

Thanks.
I am using a Linux host with kernel version 4.x for cross building.
It looks like autoconf is not defining the "PGRP_PIPE" macro variable as
there is no check for linux kernel version 4.
I don't see the error " child setpgid (4238 to 4232): Operation not
permitted" after I backport this patch
https://git.savannah.gnu.org/cgit/bash.git/diff/configure.ac?h=bash-4.4-testing&id=3bf257a5d95aa7d98d3da1a24be7b5b301716047
to bash-4.2


Re: Bash 5.2.0: Memory leak with $(

2024-01-11 Thread pourko--- via Bug reports for the GNU Bourne Again SHell
Jan 10, 2024, 15:58 by grishale...@gmail.com:

> On Mon, Jan 8, 2024, 12:26  <> pou...@tutanota.com> > wrote:
>
>> Do any of the other six patches in that report also apply to Bash 5.2?
>>
>
> Yes, all but the one for the `kv' builtin which did not exist yet. See 
> attached.
>
>>
>>
Nice!

Will this find its place in the official "bash-5.2-patches" folder?