printf builtin retains timezone from un-set TZ environment variable

2019-07-13 Thread Travis Everett
Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: darwin17.7.0
Compiler: clang
Compilation CFLAGS: -g -O2 -Wno-parentheses -Wno-format-security
uname output: Darwin f638f97d 18.6.0 Darwin Kernel Version 18.6.0: Thu Apr
25 23:16:27 PD$
Machine Type: x86_64-apple-darwin17.7.0

Bash Version: 5.0
Patch Level: 7
Release Status: release

*Description:*
I've noticed that a timezone explicitly set in the TZ environment variable
can get cached (if any date-using command/builtin is run) and then show up
(at least) in dates produced by the printf builtin's "%(...)T" formats even
after the TZ variable is unset.

I played around with this for a bit and identified two ways to get the
timezone to re-set:

   - Run an external command. I have yet to find an external command that
   didn't correct it, so I suspect this behavior is intrinsic to the API calls
   made when running an external command. It's also possible I'm too
   unimaginative to find an external command that *doesn't* invoke date APIs
   :).
   - Set the TZ variable inline and run any builtin (i.e.,
   `TZ=:/etc/localtime :`).


*Repeat-By:*
 tzcache.sh 
echo $BASH_VERSION
export TZ=Antarctica/South_Pole
printf "TZ: $TZ\n\t%(%+)T\n"
unset TZ

# still prints the now-unset timezone
printf "TZ: $TZ\n\t%(%+)T\n"

# to reset it
# run any external command
# or any builtin with an inline TZ setting
TZ=:/etc/localtime :
printf "TZ: $TZ\n\t%(%+)T\n"


$ env -i $(which bash) --norc --noprofile tzcache.sh
5.0.7(1)-release
TZ: Antarctica/South_Pole
Sun Jul 14 05:21:49 NZST 2019
TZ:
Sun Jul 14 05:21:49 NZST 2019
TZ:
Sat Jul 13 12:21:49 CDT 2019


Cheers,
Travis


Re: Combination of "eval set -- ..." and $() command substitution is slow

2019-07-13 Thread astian
Dear Chet Ramey, thanks for the explanations, I assume you were able to
reproduce the issue.

First a warning that I forgot to include in the previous email: the quoted
excerpts of the gprof tables correspond to a slightly modified test script in
which I tried to magnify the overhead in order to more easily spot performance
differences.  Therefore, the absolute values displayed, e.g.  number of
function calls, are exaggerated.  The relative values (and hopefully your
conclusions) remain largely the same, though.  Sorry for any potential
confusion.

A few short comments prompted by your reply (probably not very useful, feel
free to ignore them).

Chet Ramey:
>> - $() seems generally slightly slower than ``, but becomes pathologically
>>   so when preceded with "eval set -- ...".
>
> It is slightly slower -- POSIX requires that the shell parse the contents
> of $(...) to determine that it's a valid script as part of finding the
> closing `)'. The rules for finding the closing "`" don't have that
> requirement.
>
>> - "eval set -- ..." itself doesn't seem slow at all, but obviously it has
>>   side-effects not captured by the "time" measurement tool.
>
> What happens is you end up with a 4900-character command string that you
> have to parse multiple times. But that's not the worst of it.

Since this statement ought to run exactly once, naïvely I would expect that by
"multiple times" you really mean at most "twice": once for the top-level
script, another time "inside" the eval "sub-script".

> The gprof output provides a clue.
>
>
>>   case 1 1 0 (pathological):
>>%   cumulative   self  self total
>>   time   seconds   secondscalls  us/call  us/call  name
>>   38.89  0.21 0.2128890 7.27 7.27  set_line_mbstate
>
> set_line_mbstate() runs through each command line before parsing, creating
> a bitmap that indicates whether each element is a single-byte character or
> part of a multi-byte character. The scanner uses this to determine whether
> a shell metacharacter should act as a delimiter or get skipped over as part
> of a multibyte character. For a single run with args `1 1 0', it gets
> called around 7300 times, with around 2400 of them for the 4900-character
> string with all the arguments.
>
> When you're in a multibyte locale (en_US.UTF-8 is one such), each one of
> those characters requires a call to mbrlen/mbrtowc. So that ends up being
> 2400 * 4900 calls to mbrlen.

I am indeed using an UTF-8 locale, but I tested also with export LC_ALL=C and 
the
behaviour did not change, I should have mentioned that.

Also, it occurs to me that since all shell metacharacters are surely 7-bit
ASCII, and all UTF-8 code units are strictly and deliberately non-ASCII 8-bit
bytes, at least in the case of UTF-8 it may not be necessary to construct such
bitmap: there cannot be a shell-metacharacter byte part of a UTF-8 sequence
representing something else, so there's no need for the distinction.  Of
course, this observation applies only to such specially crafted multibyte
encodings as UTF-8 (which nonetheless is surely by far the most common).
Furthermore, it does not explain the issue at hand.

> There is something happening here -- there's no way there should be that
> many calls to set_line_mbstate(),

Notice that there are almost as many calls (only 2 fewer) in case "0 1 0" (in
which eval is not used) yet in that case the performance is not harmed.
Quoting from the previous email:

  case 0 1 0:
   2.08  0.32 0.012 0.00 0.00  set_line_mbstate
  case 1 1 0 (pathological):
  38.89  0.21 0.2128890 7.27 7.27  set_line_mbstate

OTOH, all other test cases show less than 100 calls (these numbers correspond
to a profiling run of the given script, unmodified):

  $ grep -m1 set_line_mbstate gmon*.table
  gmon.out.0-0-X.table:  0.00  0.00 0.00   88 0.00 0.00  
set_line_mbstate
  gmon.out.0-1-0.table:  0.00  0.13 0.00 7288 0.00 0.00  
set_line_mbstate
  gmon.out.0-1-1.table:  0.00  0.18 0.00   88 0.00 0.00  
set_line_mbstate
  gmon.out.1-0-X.table:  0.00  0.01 0.00   90 0.00 0.00  
set_line_mbstate
  gmon.out.1-1-0.table: 37.50  0.06 0.06 7290 8.23 8.23  
set_line_mbstate
  gmon.out.1-1-1.table:  0.00  0.07 0.00   90 0.00 0.00  
set_line_mbstate

> even when you have to save and restore
> the input line because you have to parse the contents of $(). There must
> be some combination of the effect of `eval' on the line bitmap and the
> long string. I'll see what I can figure out.
>
> Chet

Cheers.



Re: background, pipeline, hang

2019-07-13 Thread Mike Jonkmans
On Fri, Jul 12, 2019 at 02:26:52PM -0400, Chet Ramey wrote:
> Date: Fri, 12 Jul 2019 14:26:52 -0400
> From: Chet Ramey 
> To: Mike Jonkmans , bug-bash@gnu.org,
>  b...@packages.debian.org
> Cc: chet.ra...@case.edu
> Subject: Re: background, pipeline, hang
> Reply-To: chet.ra...@case.edu
> 
> On 7/12/19 10:00 AM, Mike Jonkmans wrote:
> 
> > Bash Version: 4.4
> > Patch Level: 20
> > Release Status: release
> > 
> > Description:
> > A pipeline, with two or more pipes, hangs when a background job is 
> > running.
> > Only when the background job finishes, the shell prompts again.
> > Without the background job, the shell prompts immediately.
> > 
> > Repeat-By:
> > : | : | : ## immediately finishes
> > sleep 10 &
> > : | : | : ## this hangs (^C, ^\, ^Z do not help) until the sleep 
> > finishes
> 
> Sorry, I can't reproduce this.
> 
> Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/

Thanks for looking, though.

With the tip of Greg Wooledge (use --norc --noprofile),
from another post, i was able to research this further.


This runs for two seconds:
bash --norc --noprofile -ic "trap 'tput hs' DEBUG; sleep 2 & :|:|:"

While this immediately finishes:
bash --norc --noprofile -ic "trap ':' DEBUG; sleep 2 & :|:|:"


I also found something in the (ubuntu)changelog, that might be related.
Quotes from "apt-get changelog bash":
bash (4.3-6) unstable; urgency=medium
[...]
* Apply upstream patches 001 - 008:
[...]
- A change to save state while running the DEBUG trap caused pipelines 
to
  hang on systems which need process group synchronization while building
  pipelines.


It seems to have to do with the duration of the command executed in the DEBUG 
trap.
As this also exhibits an unexpected, longer duration:
bash --norc --noprofile -ic "trap 'sleep 1; echo $SECONDS' DEBUG; sleep 
5 & :|:|:"; echo $SECONDS

Compared to:
bash --norc --noprofile -ic "trap 'echo $SECONDS' DEBUG; sleep 5 & 
:|:|:"; echo $SECONDS


So this combination is problematic:
- trap on DEBUG where the command takes some time
- something running in the background
- a pipeline with two or more pipes


Hope you can make something of this.

Regards, Mike



Re: background, pipeline, hang

2019-07-13 Thread Mike Jonkmans
On Fri, Jul 12, 2019 at 10:14:19AM -0400, Greg Wooledge wrote:
> Date: Fri, 12 Jul 2019 10:14:19 -0400
> From: Greg Wooledge 
> To: Mike Jonkmans 
> Cc: bug-bash@gnu.org, b...@packages.debian.org
> Subject: Re: background, pipeline, hang
> 
> On Fri, Jul 12, 2019 at 04:00:57PM +0200, Mike Jonkmans wrote:
> > uname output: Linux sint 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
> > 10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
> 
> (Why are you reporting problems on an Ubuntu system on a debian.org
> mailing list?)

Valid point.
I used bashbug. The email addresses were already filled in.
Should i leave out b...@packages.debian.org ?

> > Repeat-By:
> > : | : | : ## immediately finishes
> > sleep 10 &
> > : | : | : ## this hangs (^C, ^\, ^Z do not help) until the sleep 
> > finishes
> 
> I cannot reproduce this.  I tried bash 4.4 on Debian 10 (self-built),
> the standard bash 4.4 package (version 4.4-5) on Debian 9, and the standard
> bash 5.0 package (version 5.0-4) on Debian 10.

Thanks for trying.

> You might want to try running a new instance of bash with --norc --noprofile
> to remove any dot file settings from the picture.

This is a very good tip.
It might be useful to have this mentioned somewhere in bashbug.

I have narrowed it down thanks to this. Will use another post for that.

Thanks.